Gzip Compression Using Altera OpenCL. Mohamed Abdelfattah (University of Toronto) Andrei Hagiescu Deshanand Singh

Size: px
Start display at page:

Download "Gzip Compression Using Altera OpenCL. Mohamed Abdelfattah (University of Toronto) Andrei Hagiescu Deshanand Singh"

Transcription

1 Gzip Compression Using Altera OpenCL Mohamed Abdelfattah (University of Toronto) Andrei Hagiescu Deshanand Singh

2 Gzip Widely-used lossless compression program Gzip = LZ77 + Huffman Big data needs fast compression Gigabyte-per-second Lower disk space in data centers Less power on communication networks 2

3 LZ77 Compression Example This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 3

4 LZ77 Compression Example This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 4

5 LZ77 Compression Example This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 5

6 LZ77 Compression Example This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 6

7 LZ77 Compression Example This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 7

8 LZ77 Compression Example This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 3. Replace with a reference to previous occurrence 8

9 LZ77 Compression Example This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 1. Match length 2. Match offset 3. Replace with a reference to previous occurrence 9

10 LZ77 Compression Example This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 1. Match length = 2 2. Match offset 3. Replace with a reference to previous occurrence 10

11 LZ77 Compression Example This sentence is an easy sentence to compress. 1. Scan file byte by byte 2. Look for matches 1. Match length = 3 2. Match offset 3. Replace with a reference to previous occurrence 11

12 LZ77 Compression Example This sentence is an easy sentence to compress. Match offset = 20 bytes 1. Scan file byte by byte 2. Look for matches 1. Match length = 8 2. Match offset 3. Replace with a reference to previous occurrence 12

13 LZ77 Compression Example This sentence is an easy sentence to compress. Match offset = 20 bytes 1. Scan file byte by byte 2. Look for matches 1. Match length = 8 2. Match offset = Replace with a reference to previous occurrence 13

14 LZ77 Compression Example This sentence is an to compress. 1. Scan file byte by byte 2. Look for matches Match length = 8 Match offset = Replace with a reference to previous occurrence Marker, length, offset 14

15 LZ77 Compression Example This sentence is an easy sentence to compress. This sentence is an to compress. Saved 5 bytes! 1. Scan file byte by byte 2. Look for matches Match length = 8 Match offset = Replace with a reference to previous occurrence Marker, length, offset 15

16 Altera OpenCL Compiler for FPGAs 16 Host Code //host code //Enqueue buffer //Enqueue Kernel(s) //dequeue buffers OpenCL Single-threaded Code void kernel simple(global int *input, int size, global int *output) { for(i=1..size) { int x = input[i]; int y = input[i+1]; int z = x + y; output[i] = z; } } Altera s OpenCL Compiler Altera s OpenCL Compiler Host CPU PCIe FPGA Accelerator Load x Store z Load y DDRx Memory

17 Altera OpenCL Compiler for FPGAs 17 Host Code //host code //Enqueue buffer //Enqueue Kernel(s) //dequeue buffers OpenCL Single-threaded Code void kernel simple(global int *input, int size, global int *output) { for(i=1..size) { int x = input[i]; int y = input[i+1]; int z = x + y; output[i] = z; } } Altera s OpenCL Compiler Altera s OpenCL Compiler Host CPU PCIe FPGA Accelerator Load x 1 Store z Load y DDRx Memory

18 Altera OpenCL Compiler for FPGAs 18 Host Code //host code //Enqueue buffer //Enqueue Kernel(s) //dequeue buffers OpenCL Single-threaded Code void kernel simple(global int *input, int size, global int *output) { for(i=1..size) { int x = input[i]; int y = input[i+1]; int z = x + y; output[i] = z; } } Altera s OpenCL Compiler Altera s OpenCL Compiler Host CPU PCIe FPGA Accelerator Load x 2 1 Store z Load y DDRx Memory

19 Altera OpenCL Compiler for FPGAs 19 Host Code //host code //Enqueue buffer //Enqueue Kernel(s) //dequeue buffers OpenCL Single-threaded Code void kernel simple(global int *input, int size, global int *output) { for(i=1..size) { int x = input[i]; int y = input[i+1]; int z = x + y; output[i] = z; } } Altera s OpenCL Compiler Altera s OpenCL Compiler Host CPU PCIe FPGA Accelerator Load x Store z Load y DDRx Memory

20 FPGAs can be VERY Custom Host CPU ARM Host on FPGA chip IO Channels PCIe FPGA Accelerator Load x Load y IO Channels Store z Different memory types RDL? QDR? DDRx Memory

21 Implementation Overview 1. Shift In New Data 2. Dictionary Lookup/Update 3. Match Search & Filtering 4. Write to output 21

22 1. Shift In New Data Current Window Input from DDR memory 22

23 1. Shift In New Data Current Window o l d _ t e x t e.g. sample_text Cycle boundary 23

24 1. Shift In New Data Current Window o l d _ t e x t e.g. sample_text Use text in our example, but can be anything Cycle boundary VEC = 4 24

25 1. Shift In New Data Current Window t e x t e.g. sample_text Cycle boundary 25

26 1. Shift In New Data Current Window t e x t s a m p e.g. le_text Cycle boundary 26

27 Implementation Overview 1. Shift In New Data 2. Dictionary Lookup/Update 3. Match Search & Filtering 4. Write to output 27

28 2. Dictionary Lookup/Update Dictionary 0 Current Window: t e x t s a m p Dictionary 1 1. Compute hash 2. Look for match in 4 dictionaries 3. Update dictionaries Dictionary 2 Dictionary 3 Dictionaries buffer the text that we have already processed, e.g.: 28

29 2. Dictionary Lookup/Update t a n _ Dictionary 0 Current Window: t e x t s a m p t e x t t e x t Dictionary 1 Hash e x t x t s s a t s a m t e x l Dictionary 2 t e e n Dictionary 3 29

30 2. Dictionary Lookup/Update t a n _ Dictionary 0 e a t e Current Window: t e x t s a m p t e x t t e x t Dictionary 1 e a r s Hash e x t x t s s a t s a m t e x l Dictionary 2 e e p s t e e n Dictionary 3 e n t e 30

31 2. Dictionary Lookup/Update t a n _ Dictionary 0 e a t e x a n t Current Window: t e x t s a m p t e x t t e x t Dictionary 1 e a r s x y l o Hash e x t x t t s s a s a m t e x l Dictionary 2 e e p s x e l y t e e n Dictionary 3 e n t e x i r t 31

32 2. Dictionary Lookup/Update t a n _ Dictionary 0 e a t e x a n t t a n _ t e x t Dictionary 1 e a r s x y l o t a m e Possile matches from history (dictionaries) Hash Current Window: t e x t s a m p t e x t e x t s x t s a t s a m t e x l Dictionary 2 e e p s x e l y t e a l t e e n Dictionary 3 e n t e x i r t 32 t e e n

33 2. Dictionary Lookup/Update Dictionary 0 Current Window: t e x t s a m p t e x t Dictionary 1 Hash e x t x t s s a t s a m Dictionary 2 Dictionary 3 33

34 2. Dictionary Lookup/Update RD03 RD02 t e e n Dictionary 0 t e x l RD01 RD00 W0 t a n _ t e x t t e x t Current Window: t e x t s a m p RD13 RD12 Dictionary 1 RD11 RD10 RD23 RD22 Dictionary 2 W1 RD21 RD20 W2 Generate exactly the number of read/write ports that we need and the width 256 read ports, 16 write ports 128 bits RD33 RD32 Dictionary 3 RD31 RD30 W3 34

35 Implementation Overview 1. Shift In New Data 2. Dictionary Lookup/Update 3. Match Search & Filtering 4. Write to output 35

36 3. Match Search & Filtering Comparison Windows: Current Windows: t e e n t e x l t e x t t a n _ t e x t e n t e e e p s e a r s e a t e e x t s x i r t x e l y x y l o x a n t x t s a t e e n t e a l t a m e t a n _ t s a m A set of candidate matches for each incoming substring The substrings Compare current window against each of its 4 compare windows 36

37 3. Match Search & Filtering Comparison Windows: t e e n t e x l t e x t t a n _ Comparators Current Window: t e x t Match Length: We have another 3 of those Compare each byte 37

38 3. Match Search & Filtering Comparison Windows: t e e n t e x l t e x t t a n _ Comparators Current Window: t e x t Match Length: Match Reduction Best Length: 4 38

39 3. Match Search & Filtering 39

40 3. Match Search & Filtering 40

41 3. Match Search & Filtering 41

42 3. Match Search & Filtering Typical C-code Fixed loop bounds compiler can unroll loop 42

43 3. Match Search & Filtering One bestlength associated with each current_window t e x t s a m p t e x t 3 e x t s 1 3 x t s a 3 t s a m

44 3. Match Search & Filtering Cycle boundary Best lengths: t e x t s a m p Matches Select the best combination of matches from the set of candidate matches 1. Remove matches that are longer when encoded than original 2. From the remaining set; select the best ones (heuristic for bin-packing) last-fit 3. Compute first valid position for next step 44

45 3. Match Search & Filtering Cycle boundary Best lengths: t e x t s a m p Last-fit Matches 1 Too short 2 Overlap 4 Last-fit Select the best combination of matches from the set of candidate matches 1. Remove matches that are longer when encoded than original 2. From the remaining set; select the best ones (heuristic for bin-packing) last-fit 3. Compute first valid position for next step 45

46 3. Match Search & Filtering Cycle boundary Best lengths: t e x t s a m p Last-fit Matches 1 Too short 2 Overlap 4 Last-fit Select the best combination of matches from the set of candidate matches 1. Remove matches that are longer when encoded than original 2. From the remaining set; select the best ones (heuristic for bin-packing) last-fit 3. Compute first valid position for next step 46

47 3. Match Search & Filtering Best lengths: t e x t s a m p Cycle boundary First Valid position next cycle Matches: Last-fit Select the best combination of matches from the set of candidate matches 1. Remove matches that are longer when encoded than original 2. From the remaining set; select the best ones (heuristic for bin-packing) last-fit 3. Compute first valid position for next step 47

48 Implementation Overview 1. Shift In New Data 2. Dictionary Lookup/Update 3. Match Search & Filtering 4. Write to output 53

49 4. Writing to Output Marker, length, offset Length is limited by VEC (=16 in our case) fits in 4 bits Offset is limited by 0x40000 (doesn t make sense to be more) fits in 21 bits Use either 3 or 4 bytes for this: Offset < 2048 MARKER LENGTH OFFSET OFFSET Offset = MARKER LENGTH OFFSET OFFSET OFFSET 54

50 Results 55 MARKER LENGTH OFFSET OFFSET OFFSET

51 Comparison against CPU/Verilog Best Gzips out there! 56

52 Comparison against CPU/Verilog Best implementation of Gzip on CPU By Intel corporation On Intel Core i5 (32nm) processor 2013 Compression Speed: 338 MB/s Compression ratio: 2.18X 57

53 Comparison against CPU/Verilog Best implementation on ASICs AHA products group Coming up Q Compression Speed: 2.5 GB/s 58

54 Comparison against CPU/Verilog Best implementation on FPGAs Verilog IBM Corporation Nov ICCAD Altera Stratix-V A7 Compression Speed: 3 GB/s 59

55 Comparison against CPU/Verilog OpenCL design example Altera Stratix-V A7 Developed in 1 month Compression speed? Compression Ratio? 60

56 Comparison against CPU/Verilog 2.7 GB/s 3 GB/s 2.5 GB/s 0.3 GB/s 61

57 Comparison against CPU Same compression ratio 12X better performance/watt 62

58 Comparison against Verilog 10% Slower 12% more resources Much lower design effort and design time Days instead of months 63

59 Thank You

AN 831: Intel FPGA SDK for OpenCL

AN 831: Intel FPGA SDK for OpenCL AN 831: Intel FPGA SDK for OpenCL Host Pipelined Multithread Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1 Intel FPGA SDK for OpenCL Host Pipelined Multithread...3 1.1

More information

High-Throughput Lossless Compression on Tightly Coupled CPU-FPGA Platforms

High-Throughput Lossless Compression on Tightly Coupled CPU-FPGA Platforms High-Throughput Lossless Compression on Tightly Coupled CPU-FPGA Platforms Weikang Qiao, Jieqiong Du, Zhenman Fang, Michael Lo, Mau-Chung Frank Chang, Jason Cong Center for Domain-Specific Computing, UCLA

More information

OpenCL on FPGAs - Creating custom accelerated solutions

OpenCL on FPGAs - Creating custom accelerated solutions OpenCL on FPGAs - Creating custom accelerated solutions Manuel Greisinger Channel Manager, Central & Eastern Europe Oct 13 th, 2015 ESSEI Technology Day, Gilching, Germany Industry Trends Increasing product

More information

A Scalable High-Bandwidth Architecture for Lossless Compression on FPGAs

A Scalable High-Bandwidth Architecture for Lossless Compression on FPGAs 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines A Scalable High-Bandwidth Architecture for Lossless Compression on FPGAs Jeremy Fowers, Joo-Young Kim and Doug

More information

Altera SDK for OpenCL

Altera SDK for OpenCL Altera SDK for OpenCL A novel SDK that opens up the world of FPGAs to today s developers Altera Technology Roadshow 2013 Today s News Altera today announces its SDK for OpenCL Altera Joins Khronos Group

More information

Higher Level Programming Abstractions for FPGAs using OpenCL

Higher Level Programming Abstractions for FPGAs using OpenCL Higher Level Programming Abstractions for FPGAs using OpenCL Desh Singh Supervising Principal Engineer Altera Corporation Toronto Technology Center ! Technology scaling favors programmability CPUs."#/0$*12'$-*

More information

NETWORK ON CHIP TO IMPLEMENT THE SYSTEM-LEVEL COMMUNICATION SIMPLIFIES THE DISTRIBUTION OF I/O DATA THROUGHOUT THE CHIP, AND IS ALWAYS

NETWORK ON CHIP TO IMPLEMENT THE SYSTEM-LEVEL COMMUNICATION SIMPLIFIES THE DISTRIBUTION OF I/O DATA THROUGHOUT THE CHIP, AND IS ALWAYS ... THE CASE FOR EMBEDDED NETWORKS ON CHIP ON FIELD-PROGRAMMABLE GATE ARRAYS... THE AUTHORS PROPOSE AUGMENTING THE FPGA ARCHITECTURE WITH AN EMBEDDED NETWORK ON CHIP TO IMPLEMENT THE SYSTEM-LEVEL COMMUNICATION

More information

To Zip or not to Zip. Effective Resource Usage for Real-Time Compression

To Zip or not to Zip. Effective Resource Usage for Real-Time Compression To Zip or not to Zip Effective Resource Usage for Real-Time Compression Danny Harnik, Oded Margalit, Ronen Kat, Dmitry Sotnikov, Avishay Traeger IBM Research - Haifa Our scope Real-Time Compression Compression

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

Realtime Signal Processing on Embedded GPUs

Realtime Signal Processing on Embedded GPUs Realtime Signal Processing on Embedded s Dr. Matthias Rosenthal Armin Weiss Dr. Amin Mazloumian Institute of Embedded Systems Realtime Platforms Research Group Zurich University of Applied Sciences Motivation

More information

Pactron FPGA Accelerated Computing Solutions

Pactron FPGA Accelerated Computing Solutions Pactron FPGA Accelerated Computing Solutions Intel Xeon + Altera FPGA 2015 Pactron HJPC Corporation 1 Motivation for Accelerators Enhanced Performance: Accelerators compliment CPU cores to meet market

More information

Database Acceleration Solution Using FPGAs and Integrated Flash Storage

Database Acceleration Solution Using FPGAs and Integrated Flash Storage Database Acceleration Solution Using FPGAs and Integrated Flash Storage HK Verma, Xilinx Inc. August 2017 1 FPGA Analytics in Flash Storage System In-memory or Flash storage based DB reduce disk access

More information

Basic Compression Library

Basic Compression Library Basic Compression Library Manual API version 1.2 July 22, 2006 c 2003-2006 Marcus Geelnard Summary This document describes the algorithms used in the Basic Compression Library, and how to use the library

More information

FPGA Acceleration of 3D Component Matching using OpenCL

FPGA Acceleration of 3D Component Matching using OpenCL FPGA Acceleration of 3D Component Introduction 2D component matching, blob extraction or region extraction, is commonly used in computer vision for detecting connected regions that meet pre-determined

More information

HEAD HardwarE Accelerated Deduplication

HEAD HardwarE Accelerated Deduplication HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee Executive Summary A-Z development of deduplication SW version

More information

S 1. Evaluation of Fast-LZ Compressors for Compacting High-Bandwidth but Redundant Streams from FPGA Data Sources

S 1. Evaluation of Fast-LZ Compressors for Compacting High-Bandwidth but Redundant Streams from FPGA Data Sources Evaluation of Fast-LZ Compressors for Compacting High-Bandwidth but Redundant Streams from FPGA Data Sources Author: Supervisor: Luhao Liu Dr. -Ing. Thomas B. Preußer Dr. -Ing. Steffen Köhler 09.10.2014

More information

JPEG decoding using end of block markers to concurrently partition channels on a GPU. Patrick Chieppe (u ) Supervisor: Dr.

JPEG decoding using end of block markers to concurrently partition channels on a GPU. Patrick Chieppe (u ) Supervisor: Dr. JPEG decoding using end of block markers to concurrently partition channels on a GPU Patrick Chieppe (u5333226) Supervisor: Dr. Eric McCreath JPEG Lossy compression Widespread image format Introduction

More information

A Case Study in Optimizing GNU Radio s ATSC Flowgraph

A Case Study in Optimizing GNU Radio s ATSC Flowgraph A Case Study in Optimizing GNU Radio s ATSC Flowgraph Presented by Greg Scallon and Kirby Cartwright GNU Radio Conference 2017 Thursday, September 14 th 10am ATSC FLOWGRAPH LOADING 3% 99% 76% 36% 10% 33%

More information

A Study of Data Partitioning on OpenCL-based FPGAs. Zeke Wang (NTU Singapore), Bingsheng He (NTU Singapore), Wei Zhang (HKUST)

A Study of Data Partitioning on OpenCL-based FPGAs. Zeke Wang (NTU Singapore), Bingsheng He (NTU Singapore), Wei Zhang (HKUST) A Study of Data Partitioning on OpenC-based FPGAs Zeke Wang (NTU Singapore), Bingsheng He (NTU Singapore), Wei Zhang (HKUST) 1 Outline Background and Motivations Data Partitioning on FPGA OpenC on FPGA

More information

6 - Main Memory EECE 315 (101) ECE UBC 2013 W2

6 - Main Memory EECE 315 (101) ECE UBC 2013 W2 6 - Main Memory EECE 315 (101) ECE UBC 2013 W2 Acknowledgement: This set of slides is partly based on the PPTs provided by the Wiley s companion website (including textbook images, when not explicitly

More information

Welcome. Altera Technology Roadshow 2013

Welcome. Altera Technology Roadshow 2013 Welcome Altera Technology Roadshow 2013 Altera at a Glance Founded in Silicon Valley, California in 1983 Industry s first reprogrammable logic semiconductors $1.78 billion in 2012 sales Over 2,900 employees

More information

Implementing Ultra Low Latency Data Center Services with Programmable Logic

Implementing Ultra Low Latency Data Center Services with Programmable Logic Implementing Ultra Low Latency Data Center Services with Programmable Logic John W. Lockwood, CEO: Algo-Logic Systems, Inc. http://algo-logic.com Solutions@Algo-Logic.com (408) 707-3740 2255-D Martin Ave.,

More information

ConTutto - A flexible memory interface in the OpenPOWER ecosystem OpenPOWER Foundation

ConTutto - A flexible memory interface in the OpenPOWER ecosystem OpenPOWER Foundation ConTutto - A flexible memory interface in the OpenPOWER ecosystem 2016 OpenPOWER Foundation P8 Memory Sub-System 8 DMI links available on a P8 Dual-Chip-Module Differential Memory Interface (DMI) high-speed

More information

Revolutionizing the Datacenter

Revolutionizing the Datacenter Power-Efficient Machine Learning using FPGAs on POWER Systems Ralph Wittig, Distinguished Engineer Office of the CTO, Xilinx Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit Top-5

More information

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2)

The Memory Hierarchy. Cache, Main Memory, and Virtual Memory (Part 2) The Memory Hierarchy Cache, Main Memory, and Virtual Memory (Part 2) Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Cache Line Replacement The cache

More information

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,

More information

LegUp: Accelerating Memcached on Cloud FPGAs

LegUp: Accelerating Memcached on Cloud FPGAs 0 LegUp: Accelerating Memcached on Cloud FPGAs Xilinx Developer Forum December 10, 2018 Andrew Canis & Ruolong Lian LegUp Computing Inc. 1 COMPUTE IS BECOMING SPECIALIZED 1 GPU Nvidia graphics cards are

More information

FPGA-based Supercomputing: New Opportunities and Challenges

FPGA-based Supercomputing: New Opportunities and Challenges FPGA-based Supercomputing: New Opportunities and Challenges Naoya Maruyama (RIKEN AICS)* 5 th ADAC Workshop Feb 15, 2018 * Current Main affiliation is Lawrence Livermore National Laboratory SIAM PP18:

More information

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package High Performance Machine Learning Workshop Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package Matheus Souza, Lucas Maciel, Pedro Penna, Henrique Freitas 24/09/2018 Agenda Introduction

More information

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo, Sarma Vrudhula School of Electrical, Computer and Energy Engineering School

More information

Decompressing Snappy Compressed Files at the Speed of OpenCAPI. Speaker: Jian Fang TU Delft

Decompressing Snappy Compressed Files at the Speed of OpenCAPI. Speaker: Jian Fang TU Delft Decompressing Snappy Compressed Files at the Speed of OpenCAPI Speaker: Jian Fang TU Delft 1 Current Project SHADE Scalable Heterogeneous Accelerated DatabasE Spark DB CPU POWER9 ARROW DNA Seq Sort Join

More information

A High-Performance FPGA-Based Implementation of the LZSS Compression Algorithm

A High-Performance FPGA-Based Implementation of the LZSS Compression Algorithm 2012 IEEE 2012 26th IEEE International 26th International Parallel Parallel and Distributed and Distributed Processing Processing Symposium Symposium Workshops Workshops & PhD Forum A High-Performance

More information

B. Evaluation and Exploration of Next Generation Systems for Applicability and Performance (Volodymyr Kindratenko, Guochun Shi)

B. Evaluation and Exploration of Next Generation Systems for Applicability and Performance (Volodymyr Kindratenko, Guochun Shi) A. Summary - In the area of Evaluation and Exploration of Next Generation Systems for Applicability and Performance, over the period of 01/01/11 through 03/31/11 the NCSA Innovative Systems Lab team investigated

More information

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM

More information

Brotli Compression Algorithm outline of a specification

Brotli Compression Algorithm outline of a specification Brotli Compression Algorithm outline of a specification Overview Structure of backward reference commands Encoding of commands Encoding of distances Encoding of Huffman codes Block splitting Context modeling

More information

«Real Time Embedded systems» Multi Masters Systems

«Real Time Embedded systems» Multi Masters Systems «Real Time Embedded systems» Multi Masters Systems rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL Chargé de cours rene.beuchat@hesge.ch LSN/hepia Prof. HES 1 Multi Master on Chip On a System On Chip, Master can

More information

Administrivia. HW0 scores, HW1 peer-review assignments out. If you re having Cython trouble with HW2, let us know.

Administrivia. HW0 scores, HW1 peer-review assignments out. If you re having Cython trouble with HW2, let us know. Administrivia HW0 scores, HW1 peer-review assignments out. HW2 out, due Nov. 2. If you re having Cython trouble with HW2, let us know. Review on Wednesday: Post questions on Piazza Introduction to GPUs

More information

EE 457 Unit 7b. Main Memory Organization

EE 457 Unit 7b. Main Memory Organization 1 EE 457 Unit 7b Main Memory Organization 2 Motivation Organize main memory to Facilitate byte-addressability while maintaining Efficient fetching of the words in a cache block Low order interleaving (L.O.I)

More information

Data Representation. Types of data: Numbers Text Audio Images & Graphics Video

Data Representation. Types of data: Numbers Text Audio Images & Graphics Video Data Representation Data Representation Types of data: Numbers Text Audio Images & Graphics Video Analog vs Digital data How is data represented? What is a signal? Transmission of data Analog vs Digital

More information

Neuromorphic Data Microscope

Neuromorphic Data Microscope Neuromorphic Data Microscope CLSAC 16 October 28, 2016 David Follett Founder, CEO Lewis Rhodes Labs (LRL) david@lewis-rhodes.com 978-273-0537 Slide 1 History Neuroscience 1998-2012 Neuronal Spiking Models

More information

Virtual Memory. Kevin Webb Swarthmore College March 8, 2018

Virtual Memory. Kevin Webb Swarthmore College March 8, 2018 irtual Memory Kevin Webb Swarthmore College March 8, 2018 Today s Goals Describe the mechanisms behind address translation. Analyze the performance of address translation alternatives. Explore page replacement

More information

Simple variant of coding with a variable number of symbols and fixlength codewords.

Simple variant of coding with a variable number of symbols and fixlength codewords. Dictionary coding Simple variant of coding with a variable number of symbols and fixlength codewords. Create a dictionary containing 2 b different symbol sequences and code them with codewords of length

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 20 Main Memory Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 Pages Pages and frames Page

More information

/INFOMOV/ Optimization & Vectorization. J. Bikker - Sep-Nov Lecture 3: Caching (1) Welcome!

/INFOMOV/ Optimization & Vectorization. J. Bikker - Sep-Nov Lecture 3: Caching (1) Welcome! /INFOMOV/ Optimization & Vectorization J. Bikker - Sep-Nov 2017 - Lecture 3: Caching (1) Welcome! Today s Agenda: The Problem with Memory Cache Architectures Practical Assignment 1 INFOMOV Lecture 3 Caching

More information

BASIC COMPUTER ORGANIZATION. Operating System Concepts 8 th Edition

BASIC COMPUTER ORGANIZATION. Operating System Concepts 8 th Edition BASIC COMPUTER ORGANIZATION Silberschatz, Galvin and Gagne 2009 Topics CPU Structure Registers Memory Hierarchy (L1/L2/L3/RAM) Machine Language Assembly Language Running Process 3.2 Silberschatz, Galvin

More information

Fundamentals of Computer Systems

Fundamentals of Computer Systems Fundamentals of Computer Systems Caches Martha A. Kim Columbia University Fall 215 Illustrations Copyright 27 Elsevier 1 / 23 Computer Systems Performance depends on which is slowest: the processor or

More information

Cloud Acceleration with FPGA s. Mike Strickland, Director, Computer & Storage BU, Altera

Cloud Acceleration with FPGA s. Mike Strickland, Director, Computer & Storage BU, Altera Cloud Acceleration with FPGA s Mike Strickland, Director, Computer & Storage BU, Altera Agenda Mission Alignment & Data Center Trends OpenCL and Algorithm Acceleration Networking Acceleration Data Access

More information

L9: Storage Manager Physical Data Organization

L9: Storage Manager Physical Data Organization L9: Storage Manager Physical Data Organization Disks and files Record and file organization Indexing Tree-based index: B+-tree Hash-based index c.f. Fig 1.3 in [RG] and Fig 2.3 in [EN] Functional Components

More information

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management Hideyuki Shamoto, Tatsuhiro Chiba, Mikio Takeuchi Tokyo Institute of Technology IBM Research Tokyo Programming for large

More information

Huffman encoding parallelization Taavi Adamson

Huffman encoding parallelization Taavi Adamson Huffman encoding parallelization Taavi Adamson 1. Overview For my project I decided to develop a parallelization of Huffman encoding procedure. The topic was chosen due to my understanding of the subject

More information

INT-1010 TCP Offload Engine

INT-1010 TCP Offload Engine INT-1010 TCP Offload Engine Product brief, features and benefits summary Highly customizable hardware IP block. Easily portable to ASIC flow, Xilinx or Altera FPGAs INT-1010 is highly flexible that is

More information

Yet Another Implementation of CoRAM Memory

Yet Another Implementation of CoRAM Memory Dec 7, 2013 CARL2013@Davis, CA Py Yet Another Implementation of Memory Architecture for Modern FPGA-based Computing Shinya Takamaeda-Yamazaki, Kenji Kise, James C. Hoe * Tokyo Institute of Technology JSPS

More information

SoC FPGAs Fuel Next Generation of IoT, Data Center and Communications Infrastructure Applications through Power Efficient Processing

SoC FPGAs Fuel Next Generation of IoT, Data Center and Communications Infrastructure Applications through Power Efficient Processing SoC FPGAs Fuel Next Generation of IoT, Data Center and Communications Infrastructure Applications through Power Efficient Processing By Jag Bolaria Principal Analyst September 2015 www.linleygroup.com

More information

LIQUID METAL Taming Heterogeneity

LIQUID METAL Taming Heterogeneity LIQUID METAL Taming Heterogeneity Stephen Fink IBM Research! IBM Research Liquid Metal Team (IBM T. J. Watson Research Center) Josh Auerbach Perry Cheng 2 David Bacon Stephen Fink Ioana Baldini Rodric

More information

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD

INTRODUCTION TO OPENCL TM A Beginner s Tutorial. Udeepta Bordoloi AMD INTRODUCTION TO OPENCL TM A Beginner s Tutorial Udeepta Bordoloi AMD IT S A HETEROGENEOUS WORLD Heterogeneous computing The new normal CPU Many CPU s 2, 4, 8, Very many GPU processing elements 100 s Different

More information

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU

A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU A DEDUPLICATION-INSPIRED FAST DELTA COMPRESSION APPROACH W EN XIA, HONG JIANG, DA N FENG, LEI T I A N, M I N FU, YUKUN Z HOU PRESENTED BY ROMAN SHOR Overview Technics of data reduction in storage systems:

More information

Study of LZ77 and LZ78 Data Compression Techniques

Study of LZ77 and LZ78 Data Compression Techniques Study of LZ77 and LZ78 Data Compression Techniques Suman M. Choudhary, Anjali S. Patel, Sonal J. Parmar Abstract Data Compression is defined as the science and art of the representation of information

More information

A Case for Better Integration of Host and Target Compilation When Using OpenCL for FPGAs

A Case for Better Integration of Host and Target Compilation When Using OpenCL for FPGAs A Case for Better Integration of Host and Target Compilation When Using OpenCL for FPGAs Taylor Lloyd, Artem Chikin, Erick Ochoa, Karim Ali, José Nelson Amaral University of Alberta Sept 7 FSP 2017 1 University

More information

Accelerating Business Analytics with Flash Storage and FPGAs

Accelerating Business Analytics with Flash Storage and FPGAs Accelerating Business Analytics with Flash Storage and FPGAs Satoru Watanabe Center for Technology Innovation - Information and Telecommunications Hitachi, Ltd., Research and Development Group Aug.10 2016

More information

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 2: Text Compression Lecture 6: Dictionary Compression Juha Kärkkäinen 15.11.2017 1 / 17 Dictionary Compression The compression techniques we have seen so far replace individual

More information

Efficient Hardware Acceleration on SoC- FPGA using OpenCL

Efficient Hardware Acceleration on SoC- FPGA using OpenCL Efficient Hardware Acceleration on SoC- FPGA using OpenCL Advisor : Dr. Benjamin Carrion Schafer Susmitha Gogineni 30 th August 17 Presentation Overview 1.Objective & Motivation 2.Configurable SoC -FPGA

More information

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition Chapter 8: Memory- Management Strategies Operating System Concepts 9 th Edition Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation

More information

Realtime Signal Processing on Nvidia TX2 using CUDA

Realtime Signal Processing on Nvidia TX2 using CUDA Realtime Signal Processing on Nvidia TX2 using CUDA Armin Weiss Dr. Amin Mazloumian Dr. Matthias Rosenthal Institute of Embedded Systems High Performance Multimedia Research Group Zurich University of

More information

Chapter 7: Main Memory. Operating System Concepts Essentials 8 th Edition

Chapter 7: Main Memory. Operating System Concepts Essentials 8 th Edition Chapter 7: Main Memory Operating System Concepts Essentials 8 th Edition Silberschatz, Galvin and Gagne 2011 Chapter 7: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure

More information

/INFOMOV/ Optimization & Vectorization. J. Bikker - Sep-Nov Lecture 3: Caching (1) Welcome!

/INFOMOV/ Optimization & Vectorization. J. Bikker - Sep-Nov Lecture 3: Caching (1) Welcome! /INFOMOV/ Optimization & Vectorization J. Bikker - Sep-Nov 2015 - Lecture 3: Caching (1) Welcome! Today s Agenda: The Problem with Memory Cache Architectures Practical Assignment 1 INFOMOV Lecture 3 Caching

More information

University of Osnabruck - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

University of Osnabruck - FTP Site Statistics. Top 20 Directories Sorted by Disk Space University of Osnabruck - FTP Site Statistics Property Value FTP Server ftp.usf.uni-osnabrueck.de Description University of Osnabruck Country Germany Scan Date 17/May/2014 Total Dirs 29 Total Files 92

More information

Difference Engine: Harnessing Memory Redundancy in Virtual Machines (D. Gupta et all) Presented by: Konrad Go uchowski

Difference Engine: Harnessing Memory Redundancy in Virtual Machines (D. Gupta et all) Presented by: Konrad Go uchowski Difference Engine: Harnessing Memory Redundancy in Virtual Machines (D. Gupta et all) Presented by: Konrad Go uchowski What is Virtual machine monitor (VMM)? Guest OS Guest OS Guest OS Virtual machine

More information

Real-Time Buffer Compression. Michael Doggett Department of Computer Science Lund university

Real-Time Buffer Compression. Michael Doggett Department of Computer Science Lund university Real-Time Buffer Compression Michael Doggett Department of Computer Science Lund university Project 3D graphics project Demo, Game Implement 3D graphics algorithm(s) C++/OpenGL(Lab2)/iOS/android/3D engine

More information

Classifying Information Stored in Memory! Memory Management in a Uniprogrammed System! Segments of a Process! Processing a User Program!

Classifying Information Stored in Memory! Memory Management in a Uniprogrammed System! Segments of a Process! Processing a User Program! Memory Management in a Uniprogrammed System! A! gets a fixed segment of (usually highest )"! One process executes at a time in a single segment"! Process is always loaded at "! Compiler and linker generate

More information

Implementation of Robust Compression Technique using LZ77 Algorithm on Tensilica s Xtensa Processor

Implementation of Robust Compression Technique using LZ77 Algorithm on Tensilica s Xtensa Processor 2016 International Conference on Information Technology Implementation of Robust Compression Technique using LZ77 Algorithm on Tensilica s Xtensa Processor Vasanthi D R and Anusha R M.Tech (VLSI Design

More information

Catapult: A Reconfigurable Fabric for Petaflop Computing in the Cloud

Catapult: A Reconfigurable Fabric for Petaflop Computing in the Cloud Catapult: A Reconfigurable Fabric for Petaflop Computing in the Cloud Doug Burger Director, Hardware, Devices, & Experiences MSR NExT November 15, 2015 The Cloud is a Growing Disruptor for HPC Moore s

More information

Addressing the Memory Wall

Addressing the Memory Wall Lecture 26: Addressing the Memory Wall Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Cage the Elephant Back Against the Wall (Cage the Elephant) This song is for the

More information

Matrox Imaging White Paper

Matrox Imaging White Paper Reliable high bandwidth video capture with Matrox Radient Abstract The constant drive for greater analysis resolution and higher system throughput results in the design of vision systems with multiple

More information

CS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77

CS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77 CS 493: Algorithms for Massive Data Sets February 14, 2002 Dictionary-based compression Scribe: Tony Wirth This lecture will explore two adaptive dictionary compression schemes: LZ77 and LZ78. We use the

More information

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts Memory management Last modified: 26.04.2016 1 Contents Background Logical and physical address spaces; address binding Overlaying, swapping Contiguous Memory Allocation Segmentation Paging Structure of

More information

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks Naveen Suda, Vikas Chandra *, Ganesh Dasika *, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, Yu

More information

Accelerating the Pulsar Search Pipeline with FPGAs, Programmed in OpenCL

Accelerating the Pulsar Search Pipeline with FPGAs, Programmed in OpenCL Accelerating the Pulsar Search Pipeline with FPGAs, Programmed in OpenCL Oliver Sinnen, Tyrone Sherwin, and Haomiao Wang & Prabu Thiagaraj (Manchester Uni/Raman Research Institute, Bangalore) Parallel

More information

SODA: Stencil with Optimized Dataflow Architecture Yuze Chi, Jason Cong, Peng Wei, Peipei Zhou

SODA: Stencil with Optimized Dataflow Architecture Yuze Chi, Jason Cong, Peng Wei, Peipei Zhou SODA: Stencil with Optimized Dataflow Architecture Yuze Chi, Jason Cong, Peng Wei, Peipei Zhou University of California, Los Angeles 1 What is stencil computation? 2 What is Stencil Computation? A sliding

More information

INVITED PAPER: USING OPENCL TO EVALUATE THE EFFICIENCY OF CPUS, GPUS AND FPGAS FOR INFORMATION FILTERING. Doris Chen, Deshanand Singh

INVITED PAPER: USING OPENCL TO EVALUATE THE EFFICIENCY OF CPUS, GPUS AND FPGAS FOR INFORMATION FILTERING. Doris Chen, Deshanand Singh INVITED PAPER: USING OPENCL TO EVALUATE THE EFFICIENCY OF CPUS, GPUS AND FPGAS FOR INFORMATION FILTERING Doris Chen, Deshanand Singh Altera Toronto Technology Center Toronto, Ontario, Canada dochen, dsingh@altera.com

More information

Fractal Video Compression in OpenCL: An Evaluation of CPUs, GPUs, and FPGAs as Acceleration Platforms

Fractal Video Compression in OpenCL: An Evaluation of CPUs, GPUs, and FPGAs as Acceleration Platforms Fractal Video Compression in OpenCL: An Evaluation of CPUs, GPUs, and FPGAs as Acceleration Platforms Doris Chen Altera Toronto Technology Center Toronto, Ontario, Canada e-mail: dochen@altera.com Deshanand

More information

Chapter 8: Memory-Management Strategies

Chapter 8: Memory-Management Strategies Chapter 8: Memory-Management Strategies Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and

More information

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee

More information

SDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center

SDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center SDAccel Environment The Xilinx SDAccel Development Environment Bringing The Best Performance/Watt to the Data Center Introduction Data center operators constantly seek more server performance. Currently

More information

Interconnection Network for Tightly Coupled Accelerators Architecture

Interconnection Network for Tightly Coupled Accelerators Architecture Interconnection Network for Tightly Coupled Accelerators Architecture Toshihiro Hanawa, Yuetsu Kodama, Taisuke Boku, Mitsuhisa Sato Center for Computational Sciences University of Tsukuba, Japan 1 What

More information

Lab Determining Data Storage Capacity

Lab Determining Data Storage Capacity Lab 1.3.2 Determining Data Storage Capacity Objectives Determine the amount of RAM (in MB) installed in a PC. Determine the size of the hard disk drive (in GB) installed in a PC. Determine the used and

More information

memory management Vaibhav Bajpai

memory management Vaibhav Bajpai memory management Vaibhav Bajpai OS 2013 motivation virtualize resources: multiplex CPU multiplex memory (CPU scheduling) (memory management) why manage memory? controlled overlap processes should NOT

More information

Heterogeneous Computing and OpenCL

Heterogeneous Computing and OpenCL Heterogeneous Computing and OpenCL Hongsuk Yi (hsyi@kisti.re.kr) (Korea Institute of Science and Technology Information) Contents Overview of the Heterogeneous Computing Introduction to Intel Xeon Phi

More information

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool

Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Synthesizable FPGA Fabrics Targetable by the VTR CAD Tool Jin Hee Kim and Jason Anderson FPL 2015 London, UK September 3, 2015 2 Motivation for Synthesizable FPGA Trend towards ASIC design flow Design

More information

Mapping-Aware Constrained Scheduling for LUT-Based FPGAs

Mapping-Aware Constrained Scheduling for LUT-Based FPGAs Mapping-Aware Constrained Scheduling for LUT-Based FPGAs Mingxing Tan, Steve Dai, Udit Gupta, Zhiru Zhang School of Electrical and Computer Engineering Cornell University High-Level Synthesis (HLS) for

More information

White Paper The Need for a High-Bandwidth Memory Architecture in Programmable Logic Devices

White Paper The Need for a High-Bandwidth Memory Architecture in Programmable Logic Devices Introduction White Paper The Need for a High-Bandwidth Memory Architecture in Programmable Logic Devices One of the challenges faced by engineers designing communications equipment is that memory devices

More information

Parallel LZ77 Decoding with a GPU. Emmanuel Morfiadakis Supervisor: Dr Eric McCreath College of Engineering and Computer Science, ANU

Parallel LZ77 Decoding with a GPU. Emmanuel Morfiadakis Supervisor: Dr Eric McCreath College of Engineering and Computer Science, ANU Parallel LZ77 Decoding with a GPU Emmanuel Morfiadakis Supervisor: Dr Eric McCreath College of Engineering and Computer Science, ANU Outline Background (What?) Problem definition and motivation (Why?)

More information

Virtual Memory. Study Chapters something I understand. Finally! A lecture on PAGE FAULTS! doing NAND gates. I wish we were still

Virtual Memory. Study Chapters something I understand. Finally! A lecture on PAGE FAULTS! doing NAND gates. I wish we were still Virtual Memory I wish we were still doing NAND gates Study Chapters 7.4-7.8 Finally! A lecture on something I understand PAGE FAULTS! L23 Virtual Memory 1 You can never be too rich, too good looking, or

More information

Jignesh M. Patel. Blog:

Jignesh M. Patel. Blog: Jignesh M. Patel Blog: http://bigfastdata.blogspot.com Go back to the design Query Cache from Processing for Conscious 98s Modern (at Algorithms Hardware least for Hash Joins) 995 24 2 Processor Processor

More information

Bits and Bit Patterns

Bits and Bit Patterns Bits and Bit Patterns Bit: Binary Digit (0 or 1) Bit Patterns are used to represent information. Numbers Text characters Images Sound And others 0-1 Boolean Operations Boolean Operation: An operation that

More information

Analysis of Parallelization Effects on Textual Data Compression

Analysis of Parallelization Effects on Textual Data Compression Analysis of Parallelization Effects on Textual Data GORAN MARTINOVIC, CASLAV LIVADA, DRAGO ZAGAR Faculty of Electrical Engineering Josip Juraj Strossmayer University of Osijek Kneza Trpimira 2b, 31000

More information

Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp.

Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp. Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp. Primary Storage Optimization Technologies that let you store more data on the same storage Thin provisioning Copy-on-write

More information

ROOT I/O compression algorithms. Oksana Shadura, Brian Bockelman University of Nebraska-Lincoln

ROOT I/O compression algorithms. Oksana Shadura, Brian Bockelman University of Nebraska-Lincoln ROOT I/O compression algorithms Oksana Shadura, Brian Bockelman University of Nebraska-Lincoln Introduction Compression Algorithms 2 Compression algorithms Los Reduces size by permanently eliminating certain

More information

Accelerating String Matching Using Multi-threaded Algorithm

Accelerating String Matching Using Multi-threaded Algorithm Accelerating String Matching Using Multi-threaded Algorithm on GPU Cheng-Hung Lin*, Sheng-Yu Tsai**, Chen-Hsiung Liu**, Shih-Chieh Chang**, Jyuo-Min Shyu** *National Taiwan Normal University, Taiwan **National

More information

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES OBJECTIVES Detailed description of various ways of organizing memory hardware Various memory-management techniques, including paging and segmentation To provide

More information

LOSSLESS DATA COMPRESSION AND DECOMPRESSION ALGORITHM AND ITS HARDWARE ARCHITECTURE

LOSSLESS DATA COMPRESSION AND DECOMPRESSION ALGORITHM AND ITS HARDWARE ARCHITECTURE LOSSLESS DATA COMPRESSION AND DECOMPRESSION ALGORITHM AND ITS HARDWARE ARCHITECTURE V V V SAGAR 1 1JTO MPLS NOC BSNL BANGALORE ---------------------------------------------------------------------***----------------------------------------------------------------------

More information