MSST 08 Tutorial: Data-Intensive Scalable Computing for Science

Size: px
Start display at page:

Download "MSST 08 Tutorial: Data-Intensive Scalable Computing for Science"

Transcription

1 MSST 08 Tutorial: Data-Intensive Scalable Computing for Science Julio López Parallel Data Lab -- Carnegie Mellon University Acknowledgements: CMU: Randy Bryant, Garth Gibson, Bianca Schroeder (University of Toronto), Greg Ganger. Intel: Steve Schlosser, David O Hallaron. Yahoo!: Milind Bhandarkar, Eric Baldeschwieler.

2 Acknowledgments Material: Randy Bryant, Garth Gibson, Bianca Schroeder (Univ. of Toronto), Greg Ganger. Intel: Steve Schlosser, David O Hallaron. Yahoo!: Milind Bhandarkar, Eric Baldeschwieler. Sponsors and Collaborators: National Science Foundation, SciDAC, PDSI, LANL, PNNL, ORNL, SNL, LBNL, Univ. of Michigan, UC Santa Cruz, Pittsburgh Supercomputing Center, Yahoo!, Intel, Hewlett Packard Labs. 2

3 Agenda Motivation Turning I/O intensive into CPU intensive workloads Failure trends in HPC DISC Hadoop Intro Generating Ground Models for Earthquake Simulations M45! Experiences 3

4 Methods for compressing large seismic wavefields Turning I/O intensive data-analysis tasks into massively parallel compute-intensive ones. Julio López & David O Hallaron Carnegie Mellon University

5 Motivation Data analytics increasingly difficult at scale Difficult to program Stress I/O and memory systems Dealing with failures Search companies routinely process the web graph Interest in Data-Intensive Scalable Computing How can this be leveraged for science HPC? 5

6 Data-Intensive Processing in Seismic Simulation Setup: Generating ground models. Data-analytics in Quake 4D wavefields: Terabytes per each simulated scenario. 500 GB GB 3D volume structure Geological model Unstructured mesh 10+ hours processors Mesh generation & Numerical Simulation 1-10 TB Spatio-temporal wave data Space-Time representation 3D Space t0 t1 t2 Time tn-1 6

7 Data analysis challenges Resource constraints: I/O BW and memory. Complex data operations: Selection, transposition, FFT, filtering Difficult data sharing and management. Need for semi-interactive data analysis. Data is at risk: write-once / read-never! Need better mechanisms for processing datasets. 7

8 Approach Field representation: spatially indexed compressed data structures Effectively, turn data-intensive into computeintensive. Indexed compressed wavefield representations: BEM compression. Frequency-domain compression. Microsolver computational query-time model. 8

9 Data-analysis queries Domain 8 Time series Velocity Time j Slices i Region of interest (origin, extent, spacing, rotation) Dimensionality: 0D, 1D, 2D, 3D. Sampling queries 9

10 Querying wavefields j Wave data Wavefield Mesh: Spatial index, e.g., etrees. Find element containing query point. Use element corners to retrieve wave data 10

11 Compression approach Wavefields are floating point datasets 11

12 Floating point compression state of the art Floating point compression is a hard problem. High-entropy bit patterns. NASA s image compression: Quantization. JPEG 2000: 3D FP images, DCT, DWT, BigInt. LLNL: Lorenzo predictor + FP encoding. Cornell s value prediction FP encoding. Floating point sound compression. 12

13 Compression approach Wavefields are floating point datasets Frequency domain representation Wavefields are band-limited Requires a storage layout change Out of core matrix Transposition Boundary Element Method Compression (BEMC) Dimensionality-reduction Works in the frequency-domain Allows for incremental reconstruction 13

14 Wavefield storage layout Wavefields as matrices: Space x time. Large aspect ratios. Transposition Time Column size Space Small Row size Large Small In-core Seq. Write Large Seq. Read Split R/W 14

15 BEM compression Wavefield domain representation. 15

16 BEM compression Wavefield boundary representation. Only store values at the boundary. 16

17 BEM compression q Wave data computation with query-time microsolver. Convolution u(ξ) * G(ξ,q). 17

18 Potential volume to boundary reduction Compression :1:1 1:1:2 1:2:2 1:2:3 1:2: , , , , ,202.0 Number of Boundary Elements per Segment Compression factors between 1.5X and 3X. 18

19 BEM compression steps Model segmentation: find homogeneous regions. Generate a boundary meshes. Extract wavefield data for boundary mesh nodes. Frequency domain representation. Transpose from space-time to time-space. Transform to the frequency domain. 19

20 Query-time BEM microsolver 1. Spatial lookup: Find region containing q. 2. Compute phi values for boundary elements. 3. Compute q s data from phi values. For all frequencies (ω). Phi-computation: once per region. 20

21 BEM microsolver query computation φ ξ Φ ξi+1 Φ ξi-1 q G(Q,ξ) Apply the integral equation again. Phi values are now known. Iterate over all BEs and add contribution. 21

22 Seismic wavefields Wavefield Mesh elements nodes Size (GB) LA-0.2 Hz 0.4 M 0.5 M 6.3 LA-0.3 Hz 1.6 M 1.8 M 25 LA-0.5 Hz 8 M 8.6 M 116 LA-0.7 Hz 18 M 19.4 M 260 LA-1.0 Hz 64.1 M 66.5 M 893 Min Vs = 500m/s, PPWL = 10 Pre-segmented LA-basin 0.2Hz model. Simulated time = 60s, dt = 0.01s. Output sampling rate = 10 Hz, 600 output steps. 22

23 Frequency-domain compression ratio Compression ratio LA-0.2 LA-0.25 LA-0.3 LA-0.5 LA-0.7 Wavefield Freq-comp gzip bzip2 Compression factor: 5X 10X. 23

24 Frequency compression time Normalized time LA-0.2 LA-0.3 LA-0.5 LA-0.7 LA-1.0 Wavefield In Out Trans FFT Filter PS Compression time < cp time. 24

25 BEM compression BEM comp BEM + Freq Compression ratio LA-0.5 LA-0.7 LA-1.0 Wavefield BEM compression factor: 1.4X - 3X. BEM + freq. compression factor: 7X 10X. 25

26 BEM query time computation cost 40 Setup Solve Query 30 Time (seconds) Number of boundary elements Query O(n), Setup O(n2), Solve O(n3) 26

27 Parallel BEM computation Partitioning across homogeneous regions. Partitioning phi computation across frequencies. Solving system of equations in parallel. Partitioning across query points. 27

28 Conclusions New computational wavefield database approach. Spatially indexed compressed fields. Microsolvers: query-time computation. Data-intensive to compute-intensive tasks. Indexed compressed wavefield representations: Frequency-domain compression (4X 5X). BEM compression (5X 10X). SMT out-of-core efficient transposition (copy time). 28

29 Yes but Requires application transformations: Hard! What other approaches can we take? What about dealing with failures? 29

BEMC: A Searchable, Compressed Representation for Large Seismic Wavefields

BEMC: A Searchable, Compressed Representation for Large Seismic Wavefields BEMC: A Searchable, Compressed Representation for Large Seismic Wavefields Julio López 1, Leonardo Ramírez-Guzmán 2, Jacobo Bielak 2, and David O Hallaron 1 1 Parallel Data Laboratory, Carnegie Mellon

More information

Data Intensive Scalable Computing. Thanks to: Randal E. Bryant Carnegie Mellon University

Data Intensive Scalable Computing. Thanks to: Randal E. Bryant Carnegie Mellon University Data Intensive Scalable Computing Thanks to: Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Big Data Sources: Seismic Simulations Wave propagation during an earthquake Large-scale

More information

Reflections on Failure in Post-Terascale Parallel Computing

Reflections on Failure in Post-Terascale Parallel Computing Reflections on Failure in Post-Terascale Parallel Computing 2007 Int. Conf. on Parallel Processing, Xi An China Garth Gibson Carnegie Mellon University and Panasas Inc. DOE SciDAC Petascale Data Storage

More information

Failure in Supercomputers in the Post-Terascale Era

Failure in Supercomputers in the Post-Terascale Era Failure in Supercomputers in the Post-Terascale Era Thanks to: Garth Gibson, Carnegie Mellon University and Panasas Inc. DOE SciDAC Petascale Data Storage Institute (PDSI), www.pdsi-scidac.org w/ Bianca

More information

Most of the slides in this lecture are either from or adapted from slides provided by the authors of the textbook Computer Systems: A Programmer s

Most of the slides in this lecture are either from or adapted from slides provided by the authors of the textbook Computer Systems: A Programmer s Most of the slides in this lecture are either from or adapted from slides provided by the authors of the textbook Computer Systems: A Programmer s Perspective, 2 nd Edition and are provided from the website

More information

LARGE-SCALE NORTHRIDGE EARTHQUAKE SIMULATION USING OCTREE-BASED MULTIRESOLUTION MESH METHOD

LARGE-SCALE NORTHRIDGE EARTHQUAKE SIMULATION USING OCTREE-BASED MULTIRESOLUTION MESH METHOD 16th ASCE Engineering Mechanics Conference July 16-18, 3, University of Washington, Seattle LARGE-SCALE NORTHRIDGE EARTHQUAKE SIMULATION USING OCTREE-BASED MULTIRESOLUTION MESH METHOD Eui Joong Kim 1 Jacobo

More information

A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography

A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography 1 A Scalable Parallel LSQR Algorithm for Solving Large-Scale Linear System for Seismic Tomography He Huang, Liqiang Wang, Po Chen(University of Wyoming) John Dennis (NCAR) 2 LSQR in Seismic Tomography

More information

Computer Vision 2. SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung. Computer Vision 2 Dr. Benjamin Guthier

Computer Vision 2. SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung. Computer Vision 2 Dr. Benjamin Guthier Computer Vision 2 SS 18 Dr. Benjamin Guthier Professur für Bildverarbeitung Computer Vision 2 Dr. Benjamin Guthier 1. IMAGE PROCESSING Computer Vision 2 Dr. Benjamin Guthier Content of this Chapter Non-linear

More information

Properties of a Family of Parallel Finite Element Simulations

Properties of a Family of Parallel Finite Element Simulations Properties of a Family of Parallel Finite Element Simulations David R. O Hallaron and Jonathan Richard Shewchuk December 23, 1996 CMU-CS-96-141 School of Computer Science Carnegie Mellon University Pittsburgh,

More information

Digital Image Representation Image Compression

Digital Image Representation Image Compression Digital Image Representation Image Compression 1 Image Representation Standards Need for compression Compression types Lossless compression Lossy compression Image Compression Basics Redundancy/redundancy

More information

Data-Intensive Applications on Numerically-Intensive Supercomputers

Data-Intensive Applications on Numerically-Intensive Supercomputers Data-Intensive Applications on Numerically-Intensive Supercomputers David Daniel / James Ahrens Los Alamos National Laboratory July 2009 Interactive visualization of a billion-cell plasma physics simulation

More information

Large Data Visualization

Large Data Visualization Large Data Visualization Seven Lectures 1. Overview (this one) 2. Scalable parallel rendering algorithms 3. Particle data visualization 4. Vector field visualization 5. Visual analytics techniques for

More information

Data Intensive Scalable Computing

Data Intensive Scalable Computing Data Intensive Scalable Computing Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Examples of Big Data Sources Wal-Mart 267 million items/day, sold at 6,000 stores HP built them

More information

JPEG decoding using end of block markers to concurrently partition channels on a GPU. Patrick Chieppe (u ) Supervisor: Dr.

JPEG decoding using end of block markers to concurrently partition channels on a GPU. Patrick Chieppe (u ) Supervisor: Dr. JPEG decoding using end of block markers to concurrently partition channels on a GPU Patrick Chieppe (u5333226) Supervisor: Dr. Eric McCreath JPEG Lossy compression Widespread image format Introduction

More information

Lecture 5: Error Resilience & Scalability

Lecture 5: Error Resilience & Scalability Lecture 5: Error Resilience & Scalability Dr Reji Mathew A/Prof. Jian Zhang NICTA & CSE UNSW COMP9519 Multimedia Systems S 010 jzhang@cse.unsw.edu.au Outline Error Resilience Scalability Including slides

More information

A Framework for Online Inversion-Based 3D Site Characterization

A Framework for Online Inversion-Based 3D Site Characterization A Framework for Online Inversion-Based 3D Site Characterization Volkan Akcelik, Jacobo Bielak, Ioannis Epanomeritakis,, Omar Ghattas Carnegie Mellon University George Biros University of Pennsylvania Loukas

More information

Overview. Videos are everywhere. But can take up large amounts of resources. Exploit redundancy to reduce file size

Overview. Videos are everywhere. But can take up large amounts of resources. Exploit redundancy to reduce file size Overview Videos are everywhere But can take up large amounts of resources Disk space Memory Network bandwidth Exploit redundancy to reduce file size Spatial Temporal General lossless compression Huffman

More information

CS 335 Graphics and Multimedia. Image Compression

CS 335 Graphics and Multimedia. Image Compression CS 335 Graphics and Multimedia Image Compression CCITT Image Storage and Compression Group 3: Huffman-type encoding for binary (bilevel) data: FAX Group 4: Entropy encoding without error checks of group

More information

Verification and Validation for Seismic Wave Propagation Problems

Verification and Validation for Seismic Wave Propagation Problems Chapter 26 Verification and Validation for Seismic Wave Propagation Problems (1989-2-24-25-28-29-21-211-217-) (In collaboration with Dr. Nima Tafazzoli, Dr. Federico Pisanò, Mr. Kohei Watanabe and Mr.

More information

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Perceptual Coding Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding Part II wrap up 6.082 Fall 2006 Perceptual Coding, Slide 1 Lossless vs.

More information

Scalable Compression and Transmission of Large, Three- Dimensional Materials Microstructures

Scalable Compression and Transmission of Large, Three- Dimensional Materials Microstructures Scalable Compression and Transmission of Large, Three- Dimensional Materials Microstructures William A. Pearlman Center for Image Processing Research Rensselaer Polytechnic Institute pearlw@ecse.rpi.edu

More information

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding. Project Title: Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding. Midterm Report CS 584 Multimedia Communications Submitted by: Syed Jawwad Bukhari 2004-03-0028 About

More information

High performance Computing and O&G Challenges

High performance Computing and O&G Challenges High performance Computing and O&G Challenges 2 Seismic exploration challenges High Performance Computing and O&G challenges Worldwide Context Seismic,sub-surface imaging Computing Power needs Accelerating

More information

Visual Analysis of Lagrangian Particle Data from Combustion Simulations

Visual Analysis of Lagrangian Particle Data from Combustion Simulations Visual Analysis of Lagrangian Particle Data from Combustion Simulations Hongfeng Yu Sandia National Laboratories, CA Ultrascale Visualization Workshop, SC11 Nov 13 2011, Seattle, WA Joint work with Jishang

More information

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications: Chapter 11.3 MPEG-2 MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications: Simple, Main, SNR scalable, Spatially scalable, High, 4:2:2,

More information

Predicting Service Outage Using Machine Learning Techniques. HPE Innovation Center

Predicting Service Outage Using Machine Learning Techniques. HPE Innovation Center Predicting Service Outage Using Machine Learning Techniques HPE Innovation Center HPE Innovation Center - Our AI Expertise Sense Learn Comprehend Act Computer Vision Machine Learning Natural Language Processing

More information

Keywords - DWT, Lifting Scheme, DWT Processor.

Keywords - DWT, Lifting Scheme, DWT Processor. Lifting Based 2D DWT Processor for Image Compression A. F. Mulla, Dr.R. S. Patil aieshamulla@yahoo.com Abstract - Digital images play an important role both in daily life applications as well as in areas

More information

BigDataBench: a Big Data Benchmark Suite from Web Search Engines

BigDataBench: a Big Data Benchmark Suite from Web Search Engines BigDataBench: a Big Data Benchmark Suite from Web Search Engines Wanling Gao, Yuqing Zhu, Zhen Jia, Chunjie Luo, Lei Wang, Jianfeng Zhan, Yongqiang He, Shiming Gong, Xiaona Li, Shujie Zhang, and Bizhu

More information

Erik Riedel Hewlett-Packard Labs

Erik Riedel Hewlett-Packard Labs Erik Riedel Hewlett-Packard Labs Greg Ganger, Christos Faloutsos, Dave Nagle Carnegie Mellon University Outline Motivation Freeblock Scheduling Scheduling Trade-Offs Performance Details Applications Related

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models RCFile: A Fast and Space-efficient Data

More information

High Efficiency Video Coding. Li Li 2016/10/18

High Efficiency Video Coding. Li Li 2016/10/18 High Efficiency Video Coding Li Li 2016/10/18 Email: lili90th@gmail.com Outline Video coding basics High Efficiency Video Coding Conclusion Digital Video A video is nothing but a number of frames Attributes

More information

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi 1. Introduction The choice of a particular transform in a given application depends on the amount of

More information

Using multifrontal hierarchically solver and HPC systems for 3D Helmholtz problem

Using multifrontal hierarchically solver and HPC systems for 3D Helmholtz problem Using multifrontal hierarchically solver and HPC systems for 3D Helmholtz problem Sergey Solovyev 1, Dmitry Vishnevsky 1, Hongwei Liu 2 Institute of Petroleum Geology and Geophysics SB RAS 1 EXPEC ARC,

More information

YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores

YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores Swapnil Patil Milo Polte, Wittawat Tantisiriroj, Kai Ren, Lin Xiao, Julio Lopez, Garth Gibson, Adam Fuchs *, Billie

More information

ECE 417 Guest Lecture Video Compression in MPEG-1/2/4. Min-Hsuan Tsai Apr 02, 2013

ECE 417 Guest Lecture Video Compression in MPEG-1/2/4. Min-Hsuan Tsai Apr 02, 2013 ECE 417 Guest Lecture Video Compression in MPEG-1/2/4 Min-Hsuan Tsai Apr 2, 213 What is MPEG and its standards MPEG stands for Moving Picture Expert Group Develop standards for video/audio compression

More information

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

MATE-EC2: A Middleware for Processing Data with Amazon Web Services MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering

More information

JPEG Modes of Operation. Nimrod Peleg Dec. 2005

JPEG Modes of Operation. Nimrod Peleg Dec. 2005 JPEG Modes of Operation Nimrod Peleg Dec. 2005 Color Space Conversion Example: R G B = Y Cb Cr Remember: all JPEG process is operating on YCbCr color space! Down-Sampling Another optional action is down-sampling

More information

Lecture 13 Video Coding H.264 / MPEG4 AVC

Lecture 13 Video Coding H.264 / MPEG4 AVC Lecture 13 Video Coding H.264 / MPEG4 AVC Last time we saw the macro block partition of H.264, the integer DCT transform, and the cascade using the DC coefficients with the WHT. H.264 has more interesting

More information

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack Chief Architect RainStor Agenda Importance of Hadoop + data compression Data compression techniques Compression,

More information

Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10. Onur Kahraman

Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10. Onur Kahraman Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10 Onur Kahraman High Performance Is No Longer A Nice To Have In Analytical Applications Users expect Google Like performance from

More information

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method G. Wellein, T. Zeiser, G. Hager HPC Services Regional Computing Center A. Nitsure, K. Iglberger, U. Rüde Chair for System

More information

Dimensionality reduction of MALDI Imaging datasets using non-linear redundant wavelet transform-based representations

Dimensionality reduction of MALDI Imaging datasets using non-linear redundant wavelet transform-based representations Dimensionality reduction of MALDI Imaging datasets using non-linear redundant wavelet transform-based representations Luis Mancera 1, Lyna Sellami 2, Jamie Cunliffe 2, Luis González 1, Omar Belgacem 2

More information

Oasis: An Active Storage Framework for Object Storage Platform

Oasis: An Active Storage Framework for Object Storage Platform Oasis: An Active Storage Framework for Object Storage Platform Yulai Xie 1, Dan Feng 1, Darrell D. E. Long 2, Yan Li 2 1 School of Computer, Huazhong University of Science and Technology Wuhan National

More information

ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS

ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS Ferdinando Alessi Annalisa Massini Roberto Basili INGV Introduction The simulation of wave propagation

More information

Spatial Memory Streaming (with rotated patterns)

Spatial Memory Streaming (with rotated patterns) Spatial Memory Streaming (with rotated patterns) Michael Ferdman, Stephen Somogyi, and Babak Falsafi Computer Architecture Lab at 2006 Stephen Somogyi The Memory Wall Memory latency 100 s clock cycles;

More information

The Raster Data Model

The Raster Data Model The Raster Data Model 2 2 2 2 8 8 2 2 8 8 2 2 2 2 2 2 8 8 2 2 2 2 2 2 2 2 2 Llano River, Mason Co., TX 1 Rasters are: Regular square tessellations Matrices of values distributed among equal-sized, square

More information

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric. CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

Automated Netezza Migration to Big Data Open Source

Automated Netezza Migration to Big Data Open Source Automated Netezza Migration to Big Data Open Source CASE STUDY Client Overview Our client is one of the largest cable companies in the world*, offering a wide range of services including basic cable, digital

More information

Rasters are: The Raster Data Model. Cell location specified by: Why squares? Raster Data Models 9/25/2014. GEO327G/386G, UT Austin 1

Rasters are: The Raster Data Model. Cell location specified by: Why squares? Raster Data Models 9/25/2014. GEO327G/386G, UT Austin 1 5 5 5 5 5 5 5 5 5 5 5 5 2 2 5 5 2 2 2 2 2 2 8 8 2 2 5 5 5 5 5 5 2 2 2 2 5 5 5 5 5 2 2 2 5 5 5 5 The Raster Data Model Rasters are: Regular square tessellations Matrices of values distributed among equalsized,

More information

Raster Data Models 9/18/2018

Raster Data Models 9/18/2018 Raster Data Models The Raster Data Model Rasters are: Regular square tessellations Matrices of values distributed among equal-sized, square cells 5 5 5 5 5 5 5 5 2 2 5 5 5 5 5 5 2 2 2 2 5 5 5 5 5 2 2 2

More information

The Raster Data Model

The Raster Data Model The Raster Data Model 2 2 2 2 8 8 2 2 8 8 2 2 2 2 2 2 8 8 2 2 2 2 2 2 2 2 2 Llano River, Mason Co., TX 9/24/201 GEO327G/386G, UT Austin 1 Rasters are: Regular square tessellations Matrices of values distributed

More information

JPEG 2000 vs. JPEG in MPEG Encoding

JPEG 2000 vs. JPEG in MPEG Encoding JPEG 2000 vs. JPEG in MPEG Encoding V.G. Ruiz, M.F. López, I. García and E.M.T. Hendrix Dept. Computer Architecture and Electronics University of Almería. 04120 Almería. Spain. E-mail: vruiz@ual.es, mflopez@ace.ual.es,

More information

Canopus: Enabling Extreme-Scale Data Analytics on Big HPC Storage via Progressive Refactoring

Canopus: Enabling Extreme-Scale Data Analytics on Big HPC Storage via Progressive Refactoring Canopus: Enabling Extreme-Scale Data Analytics on Big HPC Storage via Progressive Refactoring Tao Lu*, Eric Suchyta, Jong Choi, Norbert Podhorszki, and Scott Klasky, Qing Liu *, Dave Pugmire and Matt Wolf,

More information

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm International Journal of Engineering Research and General Science Volume 3, Issue 4, July-August, 15 ISSN 91-2730 A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

More information

Tackling Parallelism With Symbolic Computation

Tackling Parallelism With Symbolic Computation Tackling Parallelism With Symbolic Computation Markus Püschel and the Spiral team (only part shown) With: Franz Franchetti Yevgen Voronenko Electrical and Computer Engineering Carnegie Mellon University

More information

HPC Algorithms and Applications

HPC Algorithms and Applications HPC Algorithms and Applications Dwarf #5 Structured Grids Michael Bader Winter 2012/2013 Dwarf #5 Structured Grids, Winter 2012/2013 1 Dwarf #5 Structured Grids 1. dense linear algebra 2. sparse linear

More information

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Crossing the Chasm: Sneaking a parallel file system into Hadoop Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large

More information

SAP IQ - Business Intelligence and vertical data processing with 8 GB RAM or less

SAP IQ - Business Intelligence and vertical data processing with 8 GB RAM or less SAP IQ - Business Intelligence and vertical data processing with 8 GB RAM or less Dipl.- Inform. Volker Stöffler Volker.Stoeffler@DB-TecKnowledgy.info Public Agenda Introduction: What is SAP IQ - in a

More information

Low-complexity video compression based on 3-D DWT and fast entropy coding

Low-complexity video compression based on 3-D DWT and fast entropy coding Low-complexity video compression based on 3-D DWT and fast entropy coding Evgeny Belyaev Tampere University of Technology Department of Signal Processing, Computational Imaging Group April 8, Evgeny Belyaev

More information

Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery

Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark by Nkemdirim Dockery High Performance Computing Workloads Core-memory sized Floating point intensive Well-structured

More information

MPEG-2. And Scalability Support. Nimrod Peleg Update: July.2004

MPEG-2. And Scalability Support. Nimrod Peleg Update: July.2004 MPEG-2 And Scalability Support Nimrod Peleg Update: July.2004 MPEG-2 Target...Generic coding method of moving pictures and associated sound for...digital storage, TV broadcasting and communication... Dedicated

More information

Joint Image Classification and Compression Using Hierarchical Table-Lookup Vector Quantization

Joint Image Classification and Compression Using Hierarchical Table-Lookup Vector Quantization Joint Image Classification and Compression Using Hierarchical Table-Lookup Vector Quantization Navin Chadda, Keren Perlmuter and Robert M. Gray Information Systems Laboratory Stanford University CA-934305

More information

Flash Storage Complementing a Data Lake for Real-Time Insight

Flash Storage Complementing a Data Lake for Real-Time Insight Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum

More information

CUDA-Accelerated HD-ODETLAP: Lossy High Dimensional Gridded Data Compression

CUDA-Accelerated HD-ODETLAP: Lossy High Dimensional Gridded Data Compression CUDA-Accelerated HD-ODETLAP: Lossy High Dimensional Gridded Data Compression W. Randolph Franklin, You Li 1, Tsz-Yam Lau, and Peter Fox Rensselaer Polytechnic Institute Troy, NY, USA MAT4GIScience, 18

More information

MDHIM: A Parallel Key/Value Store Framework for HPC

MDHIM: A Parallel Key/Value Store Framework for HPC MDHIM: A Parallel Key/Value Store Framework for HPC Hugh Greenberg 7/6/2015 LA-UR-15-25039 HPC Clusters Managed by a job scheduler (e.g., Slurm, Moab) Designed for running user jobs Difficult to run system

More information

IMAGE COMPRESSION. October 7, ICSY Lab, University of Kaiserslautern, Germany

IMAGE COMPRESSION. October 7, ICSY Lab, University of Kaiserslautern, Germany Lossless Compression Multimedia File Formats Lossy Compression IMAGE COMPRESSION 69 Basic Encoding Steps 70 JPEG (Overview) Image preparation and coding (baseline system) 71 JPEG (Enoding) 1) select color

More information

SAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine

SAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine SAP IQ Software16, Edge Edition The Affordable High Performance Analytical Database Engine Agenda Agenda Introduction to Dobler Consulting Today s Data Challenges Overview of SAP IQ 16, Edge Edition SAP

More information

Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics

Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics Development of an Integrated Computational Simulation Method for Fluid Driven Structure Movement and Acoustics I. Pantle Fachgebiet Strömungsmaschinen Karlsruher Institut für Technologie KIT Motivation

More information

CME 345: MODEL REDUCTION

CME 345: MODEL REDUCTION CME 345: MODEL REDUCTION Parameterized Partial Differential Equations Charbel Farhat Stanford University cfarhat@stanford.edu 1 / 19 Outline 1 Initial Boundary Value Problems 2 Typical Parameters of Interest

More information

Image Compression using Discrete Wavelet Transform Preston Dye ME 535 6/2/18

Image Compression using Discrete Wavelet Transform Preston Dye ME 535 6/2/18 Image Compression using Discrete Wavelet Transform Preston Dye ME 535 6/2/18 Introduction Social media is an essential part of an American lifestyle. Latest polls show that roughly 80 percent of the US

More information

CISC 7610 Lecture 3 Multimedia data and data formats

CISC 7610 Lecture 3 Multimedia data and data formats CISC 7610 Lecture 3 Multimedia data and data formats Topics: Perceptual limits of multimedia data JPEG encoding of images MPEG encoding of audio MPEG and H.264 encoding of video Multimedia data: Perceptual

More information

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Crossing the Chasm: Sneaking a parallel file system into Hadoop Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large

More information

7.5 Dictionary-based Coding

7.5 Dictionary-based Coding 7.5 Dictionary-based Coding LZW uses fixed-length code words to represent variable-length strings of symbols/characters that commonly occur together, e.g., words in English text LZW encoder and decoder

More information

Analysis in the Big Data Era

Analysis in the Big Data Era Analysis in the Big Data Era Massive Data Data Analysis Insight Key to Success = Timely and Cost-Effective Analysis 2 Hadoop MapReduce Ecosystem Popular solution to Big Data Analytics Java / C++ / R /

More information

MixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp

MixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp MixApart: Decoupled Analytics for Shared Storage Systems Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp Hadoop Pig, Hive Hadoop + Enterprise storage?! Shared storage

More information

Fast Gabor Imaging with Spatial Resampling

Fast Gabor Imaging with Spatial Resampling Fast Gabor Imaging with Spatial Resampling Yongwang Ma* University of Calgary, Calgary, AB yongma@ucalgary.ca and Gary Margrave and Chad Hogan University of Calgary, Calgary, AB, Canada Summary The Gabor

More information

CMPT 365 Multimedia Systems. Media Compression - Image

CMPT 365 Multimedia Systems. Media Compression - Image CMPT 365 Multimedia Systems Media Compression - Image Spring 2017 Edited from slides by Dr. Jiangchuan Liu CMPT365 Multimedia Systems 1 Facts about JPEG JPEG - Joint Photographic Experts Group International

More information

MultiMap: Preserving disk locality for multidimensional datasets

MultiMap: Preserving disk locality for multidimensional datasets MultiMap: Preserving disk locality for multidimensional datasets Minglong Shao, Steven W. Schlosser, Stratos Papadomanolakis, Jiri Schindler, Anastassia Ailamaki, Christos Faloutsos, Gregory R. Ganger

More information

Extreme-scale Graph Analysis on Blue Waters

Extreme-scale Graph Analysis on Blue Waters Extreme-scale Graph Analysis on Blue Waters 2016 Blue Waters Symposium George M. Slota 1,2, Siva Rajamanickam 1, Kamesh Madduri 2, Karen Devine 1 1 Sandia National Laboratories a 2 The Pennsylvania State

More information

VisIt Libsim. An in-situ visualisation library

VisIt Libsim. An in-situ visualisation library VisIt Libsim. An in-situ visualisation library December 2017 Jean M. Favre, CSCS Outline Motivations In-situ visualization In-situ processing strategies VisIt s libsim library Enable visualization in a

More information

Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures

Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Mostofa Patwary 1, Nadathur Satish 1, Narayanan Sundaram 1, Jilalin Liu 2, Peter Sadowski 2, Evan Racah 2, Suren Byna 2, Craig

More information

Using Streaming SIMD Extensions in a Fast DCT Algorithm for MPEG Encoding

Using Streaming SIMD Extensions in a Fast DCT Algorithm for MPEG Encoding Using Streaming SIMD Extensions in a Fast DCT Algorithm for MPEG Encoding Version 1.2 01/99 Order Number: 243651-002 02/04/99 Information in this document is provided in connection with Intel products.

More information

Advanced Surface Based MoM Techniques for Packaging and Interconnect Analysis

Advanced Surface Based MoM Techniques for Packaging and Interconnect Analysis Electrical Interconnect and Packaging Advanced Surface Based MoM Techniques for Packaging and Interconnect Analysis Jason Morsey Barry Rubin, Lijun Jiang, Lon Eisenberg, Alina Deutsch Introduction Fast

More information

Jointly Optimized Transform Domain Temporal Prediction (TDTP) and Sub-pixel Interpolation

Jointly Optimized Transform Domain Temporal Prediction (TDTP) and Sub-pixel Interpolation Jointly Optimized Transform Domain Temporal Prediction (TDTP) and Sub-pixel Interpolation Shunyao Li, Tejaswi Nanjundaswamy, Kenneth Rose University of California, Santa Barbara BACKGROUND: TDTP MOTIVATION

More information

The Fusion Distributed File System

The Fusion Distributed File System Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique

More information

Harp-DAAL for High Performance Big Data Computing

Harp-DAAL for High Performance Big Data Computing Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big

More information

Lossy Compression of Scientific Data with Wavelet Transforms

Lossy Compression of Scientific Data with Wavelet Transforms Chris Fleizach Progress Report Lossy Compression of Scientific Data with Wavelet Transforms Introduction Scientific data gathered from simulation or real measurement usually requires 64 bit floating point

More information

Towards Large Scale Predictive Flow Simulations

Towards Large Scale Predictive Flow Simulations Towards Large Scale Predictive Flow Simulations using ITAPS Tools and Services Onkar Sahni a a Present address: Center for Predictive Engineering and Computational Sciences (PECOS), ICES, UT Austin ollaborators:

More information

Fundamentals of Video Compression. Video Compression

Fundamentals of Video Compression. Video Compression Fundamentals of Video Compression Introduction to Digital Video Basic Compression Techniques Still Image Compression Techniques - JPEG Video Compression Introduction to Digital Video Video is a stream

More information

How to Optimize Geometric Multigrid Methods on GPUs

How to Optimize Geometric Multigrid Methods on GPUs How to Optimize Geometric Multigrid Methods on GPUs Markus Stürmer, Harald Köstler, Ulrich Rüde System Simulation Group University Erlangen March 31st 2011 at Copper Schedule motivation imaging in gradient

More information

Structuring PLFS for Extensibility

Structuring PLFS for Extensibility Structuring PLFS for Extensibility Chuck Cranor, Milo Polte, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University What is PLFS? Parallel Log Structured File System Interposed filesystem b/w

More information

Rapid Processing of Synthetic Seismograms using Windows Azure Cloud

Rapid Processing of Synthetic Seismograms using Windows Azure Cloud Rapid Processing of Synthetic Seismograms using Windows Azure Cloud Vedaprakash Subramanian, Liqiang Wang Department of Computer Science University of Wyoming En-Jui Lee, Po Chen Department of Geology

More information

Forest-of-octrees AMR: algorithms and interfaces

Forest-of-octrees AMR: algorithms and interfaces Forest-of-octrees AMR: algorithms and interfaces Carsten Burstedde joint work with Omar Ghattas, Tobin Isaac, Georg Stadler, Lucas C. Wilcox Institut für Numerische Simulation (INS) Rheinische Friedrich-Wilhelms-Universität

More information

Cost Models for Query Processing Strategies in the Active Data Repository

Cost Models for Query Processing Strategies in the Active Data Repository Cost Models for Query rocessing Strategies in the Active Data Repository Chialin Chang Institute for Advanced Computer Studies and Department of Computer Science University of Maryland, College ark 272

More information

Storage Hierarchy Management for Scientific Computing

Storage Hierarchy Management for Scientific Computing Storage Hierarchy Management for Scientific Computing by Ethan Leo Miller Sc. B. (Brown University) 1987 M.S. (University of California at Berkeley) 1990 A dissertation submitted in partial satisfaction

More information

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package

Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package High Performance Machine Learning Workshop Energy Efficient K-Means Clustering for an Intel Hybrid Multi-Chip Package Matheus Souza, Lucas Maciel, Pedro Penna, Henrique Freitas 24/09/2018 Agenda Introduction

More information

Lecture 8 JPEG Compression (Part 3)

Lecture 8 JPEG Compression (Part 3) CS 414 Multimedia Systems Design Lecture 8 JPEG Compression (Part 3) Klara Nahrstedt Spring 2012 Administrative MP1 is posted Today Covered Topics Hybrid Coding: JPEG Coding Reading: Section 7.5 out of

More information

HP AutoRAID (Lecture 5, cs262a)

HP AutoRAID (Lecture 5, cs262a) HP AutoRAID (Lecture 5, cs262a) Ali Ghodsi and Ion Stoica, UC Berkeley January 31, 2018 (based on slide from John Kubiatowicz, UC Berkeley) Array Reliability Reliability of N disks = Reliability of 1 Disk

More information

Image Compression. CS 6640 School of Computing University of Utah

Image Compression. CS 6640 School of Computing University of Utah Image Compression CS 6640 School of Computing University of Utah Compression What Reduce the amount of information (bits) needed to represent image Why Transmission Storage Preprocessing Redundant & Irrelevant

More information