Tackling the Reproducibility Problem in Systems Research with Declarative Experiment Specifications

Size: px
Start display at page:

Download "Tackling the Reproducibility Problem in Systems Research with Declarative Experiment Specifications"

Transcription

1 Tackling the Reproducibility Problem in Systems Research with Declarative Experiment Specifications Ivo Jimenez, Carlos Maltzahn (UCSC) Adam Moody, Kathryn Mohror (LLNL) Jay Lofstead (Sandia) Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau (UWM)

2 The Reproducibility Problem Network Disks BIOS OS conf. Magic numbers Workload Jitter etc... Throughput (MB/s) 140 Reproduced? Original Goal: define methodology so that 20 we don t end up in this situation Cluster size 2

3 Outline Re-execution vs. validation Declarative Experiment Specification (ESF) Case Study Benefits & Challenges 3

4 Outline Re-execution vs. validation Declarative Experiment Specification (ESF) Case Study Benefits & Challenges 4

5 Reproducibility Workflow 1. Re-execute experiment Recreate original setup, re-execute experiments Technical task 2. Validate results Compare against original A subjective task How do we express objective validation criteria? What contextual information to include with results? 5

6 Experiment Goal: Show that my algorithm/system/etc. is better than the state-of-the-art. Figure Raw data libs data Means code, of Experiment workload hardware OS Observations 6

7 Outline Re-execution vs. validation Declarative Experiment Specification (ESF) Case Study Benefits & Challenges 7

8 Experiment Goal: Show that my algorithm/system/etc. is better than the state-of-the-art. Means of Experiment 8

9 Validation Language Syntax validation : 'for' condition ('and' condition)* 'expect' result ('and' result)* ; condition : vars ('in' range ('=' '<' '>' '!=') value) ; result : condition ; vars : var (',' var)* ; range : '[' range_num (',' range_num)* ']' ; range_num : NUMBER '-' NUMBER '*' ; value : '*' 'NUMBER (',' NUMBER)* ; 9

10 Outline Re-execution vs. validation Declarative Experiment Specification (ESF) Case Study Benefits & Challenges 10

11 Ceph OSDI 06 Select scalability experiment. Distributed; makes use of all resources Main bottlenecks: I/O and network Why this experiment? Top conference 10 year old experiment Ideal reproducibility conditions Access to authors, topic familiarity, same hardware, Even in an ideal scenario, we s:ll struggle Demonstrates which missing info is captured by an ESF! 11

12 Ceph Schema Validation OSDI of Statement Experiment 06 Scalability Output Experiment Data Per-OSD Average Throughput (MB/s) for cluster_size = <= * 24 and expect not net_saturated expect ceph >= 55 (raw mb/s *.90) ceph >= (raw *.90) "independent_variables": [{ "type": cluster_size, "values": 2-28 }], },{ { "dependent_variable": "type": "method", { "type": "values":"throughput", ["raw", "ceph"] }], },{"scale": "mb/s" }, "dependent_variable": "type": net_saturated", { "type": "values": "throughput", [ true", false"] }], "scale": "mb/s" }, "dependent_variable": { "type": "throughput", "scale": "mb/s" },

13 Normalized Per-OSD Throughput reproduced original OSD Cluster Size

14 Benefits & Challenges 14

15 Why care about Reproducibility? Good enough is not an excuse We can always improve the state of our practice How do we compare hardware/software in a scientific way? Experimental Cloud Infrastructure PRObE / CloudLab / Chameleon Having reproducible / validated experiments would represent a significant step toward embodying the scientific method as a core component of these infrastructures 15

16 Benefits of ESF-based methodology Brings falsibiability to our field Statements can be proven false Automate validation Validation becomes an objective task 16

17 Validation Workflow Obtain/ recreate means of experiment. Re-run and check validation clauses against output. Any validation failed? no Original work findings are corroborated yes Any significant differences between original and recreated means? yes Update means of experiment no Cannot validate original claims 17

18 Benefits of ESF-based methodology Brings falsibiability to our field Statements can be proven false Automate validation Validation becomes an objective task Usability We all do this anyway, albeit in an ad-hoc way Integrate into existing infrastructure 18

19 Integration with Existing Infrastructure pull push code and ESF Test: - Unit - Integration - Validations 19

20 Challenges Reproduce every time Include sanity checks as part of experiment Alternative: corroborate that network/disk observes expected behavior at runtime Reproduce everywhere Example: GCC s flags, combinations Alternative: provide image of complete software stack (e.g. linux containers) 20

21 Conclusion ESFs: Embody all components of an experiment Enable automation of result validation Brings us closer to the scientific method Our ideal future: Researchers use ESFs to express an hypothesis Toolkits for ESFs produce metadata-rich figures Machine-readable evaluation section 21

22 Thanks! 22

23 SILT Experiment Schema Validations SOSP 11 Goal for workload=* expect cuckoo > raw and trie > raw for lookup expect cuckoo > trie and for individual and bulk expect cuckoo > trie { The high "type": random method, read speed of flash drives means that the CPU }, budget { available for each index "type": "workload", operation "values": is relatively [ limited. This microbenchmark ] demonstrates }, that SILT s indexes "dependent_variable": { meet their "type": design "throughput", goal of "scale": bytes/second" computation-efficient indexing. } "values": [ raw, "cuckoo", "trie"] "individual", "bulk", "lookup 23

24 Geneiatakis et. al. CCS 12 24

25 Experiment Goal In this section, our goal is to evaluate the performance benefits that can be reaped, by utilizing virtual partitioning to apply otherwise expensive protection mechanisms on the most exposed part of applications. This allows us to strike a balance between the overhead imposed on the application and its exposure to attacks. 25

26 Experiment Goal In this section, our goal is to evaluate the performance benefits that can be reaped, by utilizing virtual partitioning to apply otherwise expensive protection mechanisms on the most exposed part of applications. This allows us to strike a balance between the overhead imposed on the application and its exposure to attacks. 26

27 Schema 27

28 Schema "independent_variables": [ { "type": method", "alias": [ technique ], "values": [ native", pin", isr, dta, dta_pin, dta_isr ] } ], "dependent_variable": { "type": runtime", "scale": s" }, 28

29 Validations "independent_variables": [ { "type": method", "alias": [ technique ], "values": [ native", pin", isr, dta, dta_pin, dta_isr ] } ], "dependent_variable": { "type": runtime", "scale": s" }, 29

30 Validations "independent_variables": [ { "type": method", expect native < any "alias": [ technique ], "values": [ native", pin", isr, dta, dta_pin, dta_isr ] } ], "dependent_variable": { "type": runtime", "scale": s" }, 30

31 Validations "independent_variables": [ { "type": method", expect native < any and "alias": [ technique ], "values": [ native", pin", isr, dta, dta_pin, dta_isr ] } ], "dependent_variable": { "type": runtime", "scale": s" }, 31

32 Validations "independent_variables": [ { "type": method", expect native < any and dta_pin between pin and isr "alias": [ technique ], "values": [ native", pin", isr, dta, dta_pin, dta_isr ] } ], "dependent_variable": { "type": runtime", "scale": s" }, 32

33 Validations "independent_variables": [ { "type": method", expect native < any and dta_pin between pin and isr and "alias": [ technique ], "values": [ native", pin", isr, dta, dta_pin, dta_isr ] } ], "dependent_variable": { "type": runtime", "scale": s" }, 33

34 Validations "independent_variables": [ { "type": method", expect native < any and dta_pin between pin and isr and dta_isr between isr and dta "alias": [ technique ], "values": [ native", pin", isr, dta, dta_pin, dta_isr ] } ], "dependent_variable": { "type": runtime", "scale": s" }, 34

35 Example 2 35

36 Example 2 36

37 Schema "independent_variables": [ { "type": method", "values": [ native", pin", isr, dta, dta_pin, dta_isr ] } ], "dependent_variable": { "type": runtime", "scale": s" }, 37

38 Schema "independent_variables": [ { "type": method, "values": [ native", pin", isr, dta, dta_pin, dta_isr ] }, { "type": workload, "values": [ ftp", samba", ssh ] } ], "dependent_variable": { "type": runtime", "scale": s" }, 38

39 Validations "independent_variables": [ { for "type": method, workload=* expect native < any and dta_pin between pin and isr and dta_isr between isr and dta "values": [ native", pin", isr, dta, dta_pin, dta_isr ] }, { "type": workload, "values": [ ftp", samba", ssh ] } ], "dependent_variable": { "type": runtime", "scale": s" }, 39

40 Falsifiability in Science Falsibiability of a statement, hypothesis, or theory is an inherent possibility to prove it to be false. In other words, the ability to specify the conditions under which a statement is false Synonymous to Testability Example: Statement: All swans are white Falsifiable: Observe one black swan source: en.wikipedia.org/wiki/falsifiability 40

41 41

42 Falsifiability in Systems Experiment Goal: Show that my algorithm/system/etc. is better than the state-of-the-art. Figure Raw data libs data Means code, of Experiment workload hardware OS Observations 42

43 Falsifiability in Systems To falsify a claim: Describe the means of the experiments Provide validation statements over the output data Conditional statement: if means are properly recreated then validation statements should hold Go from inert observations to falsifiable statements From: We observe that our system outperforms the alternatives To: Expect 25-30% performance improvement on hardware platform X, on workload Y, when configured like Z 43

44 Early Feedback Creating an ESF helps authors to: Find meaningful/reproducible baselines Create a feedback loop in author s mind Specify exactly what author means Make temporal context explicit 44

Tackling the Reproducibility Problem in Systems Research with Declarative Experiment Specifications

Tackling the Reproducibility Problem in Systems Research with Declarative Experiment Specifications Tackling the Reproducibility Problem in Systems Research with Declarative Experiment Specifications Ivo Jimenez, Carlos Maltzahn (UCSC) Adam Moody, Kathryn Mohror (LLNL) Jay Lofstead (Sandia) Andrea Arpaci-Dusseau,

More information

SolidFire and Ceph Architectural Comparison

SolidFire and Ceph Architectural Comparison The All-Flash Array Built for the Next Generation Data Center SolidFire and Ceph Architectural Comparison July 2014 Overview When comparing the architecture for Ceph and SolidFire, it is clear that both

More information

Curriculum Vitae. April 5, 2018

Curriculum Vitae. April 5, 2018 Curriculum Vitae April 5, 2018 Michael A. Sevilla mikesevilla3@gmail.com website: users.soe.ucsc.edu/~msevilla 127 Storey St., Santa Cruz, CA 95060 code: github.com/michaelsevilla mobile: (858) 449-3086

More information

Securing Grid Data Transfer Services with Active Network Portals

Securing Grid Data Transfer Services with Active Network Portals Securing Grid Data Transfer Services with Active Network Portals Onur Demir 1 2 Kanad Ghose 3 Madhusudhan Govindaraju 4 Department of Computer Science Binghamton University (SUNY) {onur 1, mike 2, ghose

More information

Malacology. A Programmable Storage System [Sevilla et al. EuroSys '17]

Malacology. A Programmable Storage System [Sevilla et al. EuroSys '17] Malacology A Programmable Storage System [Sevilla et al. EuroSys '17] Michael A. Sevilla, Noah Watkins, Ivo Jimenez, Peter Alvaro, Shel Finkelstein, Jeff LeFevre, Carlos Maltzahn University of California,

More information

Heckaton. SQL Server's Memory Optimized OLTP Engine

Heckaton. SQL Server's Memory Optimized OLTP Engine Heckaton SQL Server's Memory Optimized OLTP Engine Agenda Introduction to Hekaton Design Consideration High Level Architecture Storage and Indexing Query Processing Transaction Management Transaction Durability

More information

CSE 120 Principles of Operating Systems

CSE 120 Principles of Operating Systems CSE 120 Principles of Operating Systems Spring 2018 Lecture 15: Multicore Geoffrey M. Voelker Multicore Operating Systems We have generally discussed operating systems concepts independent of the number

More information

Enosis: Bridging the Semantic Gap between

Enosis: Bridging the Semantic Gap between Enosis: Bridging the Semantic Gap between File-based and Object-based Data Models Anthony Kougkas - akougkas@hawk.iit.edu, Hariharan Devarajan, Xian-He Sun Outline Introduction Background Approach Evaluation

More information

Graphene-SGX. A Practical Library OS for Unmodified Applications on SGX. Chia-Che Tsai Donald E. Porter Mona Vij

Graphene-SGX. A Practical Library OS for Unmodified Applications on SGX. Chia-Che Tsai Donald E. Porter Mona Vij Graphene-SGX A Practical Library OS for Unmodified Applications on SGX Chia-Che Tsai Donald E. Porter Mona Vij Intel SGX: Trusted Execution on Untrusted Hosts Processing Sensitive Data (Ex: Medical Records)

More information

Securing Grid Data Transfer Services with Active Network Portals

Securing Grid Data Transfer Services with Active Network Portals Securing with Active Network Portals Onur Demir 1 2 Kanad Ghose 3 Madhusudhan Govindaraju 4 Department of Computer Science Binghamton University (SUNY) {onur 1, mike 2, ghose 3, mgovinda 4 }@cs.binghamton.edu

More information

Designing a True Direct-Access File System with DevFS

Designing a True Direct-Access File System with DevFS Designing a True Direct-Access File System with DevFS Sudarsun Kannan, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau University of Wisconsin-Madison Yuangang Wang, Jun Xu, Gopinath Palani Huawei Technologies

More information

Containerizing GPU Applications with Docker for Scaling to the Cloud

Containerizing GPU Applications with Docker for Scaling to the Cloud Containerizing GPU Applications with Docker for Scaling to the Cloud SUBBU RAMA FUTURE OF PACKAGING APPLICATIONS Turns Discrete Computing Resources into a Virtual Supercomputer GPU Mem Mem GPU GPU Mem

More information

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads 89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com @EdisonGroupInc 212.367.7400 IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads A Competitive Test and Evaluation Report

More information

Data Centers and Cloud Computing

Data Centers and Cloud Computing Data Centers and Cloud Computing CS677 Guest Lecture Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet

More information

Storage in HPC: Scalable Scientific Data Management. Carlos Maltzahn IEEE Cluster 2011 Storage in HPC Panel 9/29/11

Storage in HPC: Scalable Scientific Data Management. Carlos Maltzahn IEEE Cluster 2011 Storage in HPC Panel 9/29/11 Storage in HPC: Scalable Scientific Data Management Carlos Maltzahn IEEE Cluster 2011 Storage in HPC Panel 9/29/11 Who am I? Systems Research Lab (SRL), UC Santa Cruz LANL/UCSC Institute for Scalable Scientific

More information

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

Data Centers and Cloud Computing. Slides courtesy of Tim Wood Data Centers and Cloud Computing Slides courtesy of Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet

More information

Data Centers and Cloud Computing. Data Centers

Data Centers and Cloud Computing. Data Centers Data Centers and Cloud Computing Slides courtesy of Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet

More information

Essential Features of an Integration Solution

Essential Features of an Integration Solution Essential Features of an Integration Solution September 2017 WHITE PAPER Essential Features of an Integration Solution When an enterprise uses multiple applications, it needs to connect them for a variety

More information

Mobile Offloading. Matti Kemppainen

Mobile Offloading. Matti Kemppainen Mobile Offloading Matti Kemppainen kemppi@cs.hut.fi 1. Promises and Theory Learning points What is mobile offloading? What does offloading promise? How does offloading differ from earlier practices? What

More information

A Generic Methodology of Analyzing Performance Bottlenecks of HPC Storage Systems. Zhiqi Tao, Sr. System Engineer Lugano, March

A Generic Methodology of Analyzing Performance Bottlenecks of HPC Storage Systems. Zhiqi Tao, Sr. System Engineer Lugano, March A Generic Methodology of Analyzing Performance Bottlenecks of HPC Storage Systems Zhiqi Tao, Sr. System Engineer Lugano, March 15 2013 1 Outline Introduction o Anatomy of a storage system o Performance

More information

Database Replication in Tashkent. CSEP 545 Transaction Processing Sameh Elnikety

Database Replication in Tashkent. CSEP 545 Transaction Processing Sameh Elnikety Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety Replication for Performance Expensive Limited scalability DB Replication is Challenging Single database system Large, persistent

More information

CS 261 Fall Mike Lam, Professor. Virtual Memory

CS 261 Fall Mike Lam, Professor. Virtual Memory CS 261 Fall 2016 Mike Lam, Professor Virtual Memory Topics Operating systems Address spaces Virtual memory Address translation Memory allocation Lingering questions What happens when you call malloc()?

More information

-Presented By : Rajeshwari Chatterjee Professor-Andrey Shevel Course: Computing Clusters Grid and Clouds ITMO University, St.

-Presented By : Rajeshwari Chatterjee Professor-Andrey Shevel Course: Computing Clusters Grid and Clouds ITMO University, St. -Presented By : Rajeshwari Chatterjee Professor-Andrey Shevel Course: Computing Clusters Grid and Clouds ITMO University, St. Petersburg Introduction File System Enterprise Needs Gluster Revisited Ceph

More information

Was ist dran an einer spezialisierten Data Warehousing platform?

Was ist dran an einer spezialisierten Data Warehousing platform? Was ist dran an einer spezialisierten Data Warehousing platform? Hermann Bär Oracle USA Redwood Shores, CA Schlüsselworte Data warehousing, Exadata, specialized hardware proprietary hardware Introduction

More information

Top Trends in DBMS & DW

Top Trends in DBMS & DW Oracle Top Trends in DBMS & DW Noel Yuhanna Principal Analyst Forrester Research Trend #1: Proliferation of data Data doubles every 18-24 months for critical Apps, for some its every 6 months Terabyte

More information

Chapter 7 Forensic Duplication

Chapter 7 Forensic Duplication Chapter 7 Forensic Duplication Ed Crowley Spring 11 Topics Response Strategies Forensic Duplicates and Evidence Federal Rules of Evidence What is a Forensic Duplicate? Hard Drive Development Forensic Tool

More information

Emulating Goliath Storage Systems with David

Emulating Goliath Storage Systems with David Emulating Goliath Storage Systems with David Nitin Agrawal, NEC Labs Leo Arulraj, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau ADSL Lab, UW Madison 1 The Storage Researchers Dilemma Innovate Create

More information

Choosing an Interface

Choosing an Interface White Paper SUSE Enterprise Storage SUSE Enterprise Storage is a versatile, self-healing, and fault-tolerant storage alternative. The Ceph storage system embedded in SUSE Enterprise Storage is an object-based

More information

Virtual Switch Acceleration with OVS-TC

Virtual Switch Acceleration with OVS-TC WHITE PAPER Virtual Switch Acceleration with OVS-TC HARDWARE ACCELERATED OVS-TC PROVIDES BETTER CPU EFFICIENCY, LOWER COMPLEXITY, ENHANCED SCALABILITY AND INCREASED NETWORK PERFORMANCE COMPARED TO KERNEL-

More information

Efficient, Scalable, and Provenance-Aware Management of Linked Data

Efficient, Scalable, and Provenance-Aware Management of Linked Data Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management

More information

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage Performance Study of Microsoft SQL Server 2016 Dell Engineering February 2017 Table of contents

More information

Tutorial: Analyzing MPI Applications. Intel Trace Analyzer and Collector Intel VTune Amplifier XE

Tutorial: Analyzing MPI Applications. Intel Trace Analyzer and Collector Intel VTune Amplifier XE Tutorial: Analyzing MPI Applications Intel Trace Analyzer and Collector Intel VTune Amplifier XE Contents Legal Information... 3 1. Overview... 4 1.1. Prerequisites... 5 1.1.1. Required Software... 5 1.1.2.

More information

Slacker: Fast Distribution with Lazy Docker Containers. Tyler Harter, Brandon Salmon, Rose Liu, Andrea C. Arpaci-Dusseau, Remzi H.

Slacker: Fast Distribution with Lazy Docker Containers. Tyler Harter, Brandon Salmon, Rose Liu, Andrea C. Arpaci-Dusseau, Remzi H. Slacker: Fast Distribution with Lazy Docker Containers Tyler Harter, Brandon Salmon, Rose Liu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau Container Popularity spoon.net Theory and Practice Theory:

More information

Flash vs. Disk Storage: Testing Workloads is Key

Flash vs. Disk Storage: Testing Workloads is Key Flash vs. Disk Storage: Testing Workloads is Key Len Rosenthal VP of Marketing Flash Memory Summit 2013 Santa Clara, CA 1 Overview The leader in Storage Performance Validation. Our Mission: To provide

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC white paper FlashGrid Software Intel SSD DC P3700/P3600/P3500 Topic: Hyper-converged Database/Storage FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC Abstract FlashGrid

More information

Microsoft RemoteFX for Remote Desktop Virtualization Host Capacity Planning Guide for Windows Server 2008 R2 Service Pack 1

Microsoft RemoteFX for Remote Desktop Virtualization Host Capacity Planning Guide for Windows Server 2008 R2 Service Pack 1 Microsoft RemoteFX for Remote Desktop Virtualization Host Capacity Planning Guide for Windows Server 2008 R2 Service Pack 1 Microsoft Corporation Published: March 2011 Abstract Microsoft RemoteFX delivers

More information

NVMe From The Server Perspective

NVMe From The Server Perspective NVMe From The Server Perspective The Value of NVMe to the Server Don H Walker Dell OCTO August 2012 1 NVMe Overview Optimized queuing interface, command set, and feature set for PCIe SSDs Targets only

More information

3.4 Data-Centric workflow

3.4 Data-Centric workflow 3.4 Data-Centric workflow One of the most important activities in a S-DWH environment is represented by data integration of different and heterogeneous sources. The process of extract, transform, and load

More information

Database Acceleration Solution Using FPGAs and Integrated Flash Storage

Database Acceleration Solution Using FPGAs and Integrated Flash Storage Database Acceleration Solution Using FPGAs and Integrated Flash Storage HK Verma, Xilinx Inc. August 2017 1 FPGA Analytics in Flash Storage System In-memory or Flash storage based DB reduce disk access

More information

Extremely Fast Distributed Storage for Cloud Service Providers

Extremely Fast Distributed Storage for Cloud Service Providers Solution brief Intel Storage Builders StorPool Storage Intel SSD DC S3510 Series Intel Xeon Processor E3 and E5 Families Intel Ethernet Converged Network Adapter X710 Family Extremely Fast Distributed

More information

April Copyright 2013 Cloudera Inc. All rights reserved.

April Copyright 2013 Cloudera Inc. All rights reserved. Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and the Virtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here April 2014 Analytic Workloads on

More information

Presented by: Nafiseh Mahmoudi Spring 2017

Presented by: Nafiseh Mahmoudi Spring 2017 Presented by: Nafiseh Mahmoudi Spring 2017 Authors: Publication: Type: ACM Transactions on Storage (TOS), 2016 Research Paper 2 High speed data processing demands high storage I/O performance. Flash memory

More information

VARIABILITY IN OPERATING SYSTEMS

VARIABILITY IN OPERATING SYSTEMS VARIABILITY IN OPERATING SYSTEMS Brian Kocoloski Assistant Professor in CSE Dept. October 8, 2018 1 CLOUD COMPUTING Current estimate is that 94% of all computation will be performed in the cloud by 2021

More information

Tools for Social Networking Infrastructures

Tools for Social Networking Infrastructures Tools for Social Networking Infrastructures 1 Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes

More information

MDHIM: A Parallel Key/Value Store Framework for HPC

MDHIM: A Parallel Key/Value Store Framework for HPC MDHIM: A Parallel Key/Value Store Framework for HPC Hugh Greenberg 7/6/2015 LA-UR-15-25039 HPC Clusters Managed by a job scheduler (e.g., Slurm, Moab) Designed for running user jobs Difficult to run system

More information

Advanced Systems Lab (Intro and Administration) G. Alonso Systems Group

Advanced Systems Lab (Intro and Administration) G. Alonso Systems Group Advanced Systems Lab (Intro and Administration) G. Alonso Systems Group http://www.systems.ethz.ch Overview of the Course Focus on project Individual project during semester (3 milestones) This is a project

More information

Generating Realistic Impressions for File-System Benchmarking. Nitin Agrawal Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau

Generating Realistic Impressions for File-System Benchmarking. Nitin Agrawal Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau Generating Realistic Impressions for File-System Benchmarking Nitin Agrawal Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau For better or for worse," benchmarks shape a field " David Patterson 2 Inputs

More information

REFERENCE ARCHITECTURE Microsoft SQL Server 2016 Data Warehouse Fast Track. FlashStack 70TB Solution with Cisco UCS and Pure Storage FlashArray//X

REFERENCE ARCHITECTURE Microsoft SQL Server 2016 Data Warehouse Fast Track. FlashStack 70TB Solution with Cisco UCS and Pure Storage FlashArray//X REFERENCE ARCHITECTURE Microsoft SQL Server 2016 Data Warehouse Fast Track FlashStack 70TB Solution with Cisco UCS and Pure Storage FlashArray//X FLASHSTACK REFERENCE ARCHITECTURE September 2018 TABLE

More information

TCC, so your business continues

TCC, so your business continues TCC, so your business continues 1 Whitepaper Executive summary The TCC (Tuxis Cloud Concept) is a smart way to build hosting infrastructures for your private cloud and virtual datacenter. It natively integrates

More information

Chapter 7 Forensic Duplication

Chapter 7 Forensic Duplication Chapter 7 Forensic Duplication Ed Crowley Spring 10 Topics Response Strategies Forensic Duplicates and Evidence Federal Rules of Evidence What is a Forensic Duplicate? Hard Drive Development Forensic Tool

More information

Cloudian Sizing and Architecture Guidelines

Cloudian Sizing and Architecture Guidelines Cloudian Sizing and Architecture Guidelines The purpose of this document is to detail the key design parameters that should be considered when designing a Cloudian HyperStore architecture. The primary

More information

CLIP: A Compact, Load-balancing Index Placement Function

CLIP: A Compact, Load-balancing Index Placement Function CLIP: A Compact, Load-balancing Index Placement Function Michael McThrow Storage Systems Research Center University of California, Santa Cruz Abstract Existing file searching tools do not have the performance

More information

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime B. Saha, A-R. Adl- Tabatabai, R. Hudson, C.C. Minh, B. Hertzberg PPoPP 2006 Introductory TM Sales Pitch Two legs

More information

Nomad: Mitigating Arbitrary Cloud Side Channels via Provider-Assisted Migration

Nomad: Mitigating Arbitrary Cloud Side Channels via Provider-Assisted Migration Nomad: Mitigating Arbitrary Cloud Side Channels via Provider-Assisted Migration Soo-Jin Moon, Vyas Sekar Michael K. Reiter Context: Infrastructure-as-a-Service Clouds Client API Cloud Controller Machine

More information

The Fusion Distributed File System

The Fusion Distributed File System Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique

More information

Practical Near-Data Processing for In-Memory Analytics Frameworks

Practical Near-Data Processing for In-Memory Analytics Frameworks Practical Near-Data Processing for In-Memory Analytics Frameworks Mingyu Gao, Grant Ayers, Christos Kozyrakis Stanford University http://mast.stanford.edu PACT Oct 19, 2015 Motivating Trends End of Dennard

More information

White Paper. Major Performance Tuning Considerations for Weblogic Server

White Paper. Major Performance Tuning Considerations for Weblogic Server White Paper Major Performance Tuning Considerations for Weblogic Server Table of Contents Introduction and Background Information... 2 Understanding the Performance Objectives... 3 Measuring your Performance

More information

20762B: DEVELOPING SQL DATABASES

20762B: DEVELOPING SQL DATABASES ABOUT THIS COURSE This five day instructor-led course provides students with the knowledge and skills to develop a Microsoft SQL Server 2016 database. The course focuses on teaching individuals how to

More information

Physical Disentanglement in a Container-Based File System

Physical Disentanglement in a Container-Based File System Physical Disentanglement in a Container-Based File System Lanyue Lu, Yupu Zhang, Thanh Do, Samer AI-Kiswany Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of Wisconsin Madison Motivation

More information

University of Wisconsin-Madison

University of Wisconsin-Madison Evolving RPC for Active Storage Muthian Sivathanu Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau University of Wisconsin-Madison Architecture of the future Everything is active Cheaper, faster processing

More information

Albis: High-Performance File Format for Big Data Systems

Albis: High-Performance File Format for Big Data Systems Albis: High-Performance File Format for Big Data Systems Animesh Trivedi, Patrick Stuedi, Jonas Pfefferle, Adrian Schuepbach, Bernard Metzler, IBM Research, Zurich 2018 USENIX Annual Technical Conference

More information

The Leading Parallel Cluster File System

The Leading Parallel Cluster File System The Leading Parallel Cluster File System www.thinkparq.com www.beegfs.io ABOUT BEEGFS What is BeeGFS BeeGFS (formerly FhGFS) is the leading parallel cluster file system, developed with a strong focus on

More information

Data Sharing Made Easier through Programmable Metadata. University of Wisconsin-Madison

Data Sharing Made Easier through Programmable Metadata. University of Wisconsin-Madison Data Sharing Made Easier through Programmable Metadata Zhe Zhang IBM Research! Remzi Arpaci-Dusseau University of Wisconsin-Madison How do applications share data today? Syncing data between storage systems:

More information

Towards Efficient, Portable Application-Level Consistency

Towards Efficient, Portable Application-Level Consistency Towards Efficient, Portable Application-Level Consistency Thanumalayan Sankaranarayana Pillai, Vijay Chidambaram, Joo-Young Hwang, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau 1 File System Crash

More information

Continuous Performance Testing. Mark Price Performance Engineer Improbable.io

Continuous Performance Testing. Mark Price Performance Engineer Improbable.io Continuous Performance Testing Mark Price / @epickrram Performance Engineer Improbable.io The ideal System performance testing as a first-class citizen of the continuous delivery pipeline Process Process

More information

MOHA: Many-Task Computing Framework on Hadoop

MOHA: Many-Task Computing Framework on Hadoop Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction

More information

SILT: A Memory-Efficient, High- Performance Key-Value Store

SILT: A Memory-Efficient, High- Performance Key-Value Store SILT: A Memory-Efficient, High- Performance Key-Value Store SOSP 11 Presented by Fan Ni March, 2016 SILT is Small Index Large Tables which is a memory efficient high performance key value store system

More information

Microsoft. [MS20762]: Developing SQL Databases

Microsoft. [MS20762]: Developing SQL Databases [MS20762]: Developing SQL Databases Length : 5 Days Audience(s) : IT Professionals Level : 300 Technology : Microsoft SQL Server Delivery Method : Instructor-led (Classroom) Course Overview This five-day

More information

Real-time Protection for Microsoft Hyper-V

Real-time Protection for Microsoft Hyper-V Real-time Protection for Microsoft Hyper-V Introduction Computer virtualization has come a long way in a very short time, triggered primarily by the rapid rate of customer adoption. Moving resources to

More information

"Charting the Course... MOC C: Developing SQL Databases. Course Summary

Charting the Course... MOC C: Developing SQL Databases. Course Summary Course Summary Description This five-day instructor-led course provides students with the knowledge and skills to develop a Microsoft SQL database. The course focuses on teaching individuals how to use

More information

OS Virtualization. Why Virtualize? Introduction. Virtualization Basics 12/10/2012. Motivation. Types of Virtualization.

OS Virtualization. Why Virtualize? Introduction. Virtualization Basics 12/10/2012. Motivation. Types of Virtualization. Virtualization Basics Motivation OS Virtualization CSC 456 Final Presentation Brandon D. Shroyer Types of Virtualization Process virtualization (Java) System virtualization (classic, hosted) Emulation

More information

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization Evaluating Cloud Storage Strategies James Bottomley; CTO, Server Virtualization Introduction to Storage Attachments: - Local (Direct cheap) SAS, SATA - Remote (SAN, NAS expensive) FC net Types - Block

More information

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed ASPERA HIGH-SPEED TRANSFER Moving the world s data at maximum speed ASPERA HIGH-SPEED FILE TRANSFER 80 GBIT/S OVER IP USING DPDK Performance, Code, and Architecture Charles Shiflett Developer of next-generation

More information

Zettabyte Reliability with Flexible End-to-end Data Integrity

Zettabyte Reliability with Flexible End-to-end Data Integrity Zettabyte Reliability with Flexible End-to-end Data Integrity Yupu Zhang, Daniel Myers, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau University of Wisconsin - Madison 5/9/2013 1 Data Corruption Imperfect

More information

IBM B2B INTEGRATOR BENCHMARKING IN THE SOFTLAYER ENVIRONMENT

IBM B2B INTEGRATOR BENCHMARKING IN THE SOFTLAYER ENVIRONMENT IBM B2B INTEGRATOR BENCHMARKING IN THE SOFTLAYER ENVIRONMENT 215-4-14 Authors: Deep Chatterji (dchatter@us.ibm.com) Steve McDuff (mcduffs@ca.ibm.com) CONTENTS Disclaimer...3 Pushing the limits of B2B Integrator...4

More information

Yiying Zhang, Leo Prasath Arulraj, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. University of Wisconsin - Madison

Yiying Zhang, Leo Prasath Arulraj, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. University of Wisconsin - Madison Yiying Zhang, Leo Prasath Arulraj, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin - Madison 1 Indirection Reference an object with a different name Flexible, simple, and

More information

Tradeoff Evaluation. Comparison between CB-R and CB-A as they only differ for this aspect.

Tradeoff Evaluation. Comparison between CB-R and CB-A as they only differ for this aspect. Tradeoff Evaluation Comparison between C2PL and CB-A, as both: Allow intertransaction caching Don t use propagation Synchronously activate consistency actions Tradeoff Evaluation Comparison between CB-R

More information

Harp-DAAL for High Performance Big Data Computing

Harp-DAAL for High Performance Big Data Computing Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big

More information

Practical MySQL Performance Optimization. Peter Zaitsev, CEO, Percona July 20 th, 2016 Percona Technical Webinars

Practical MySQL Performance Optimization. Peter Zaitsev, CEO, Percona July 20 th, 2016 Percona Technical Webinars Practical MySQL Performance Optimization Peter Zaitsev, CEO, Percona July 20 th, 2016 Percona Technical Webinars In This Presentation We ll Look at how to approach Performance Optimization Discuss Practical

More information

Hedvig as backup target for Veeam

Hedvig as backup target for Veeam Hedvig as backup target for Veeam Solution Whitepaper Version 1.0 April 2018 Table of contents Executive overview... 3 Introduction... 3 Solution components... 4 Hedvig... 4 Hedvig Virtual Disk (vdisk)...

More information

Cloud Security Gaps. Cloud-Native Security.

Cloud Security Gaps. Cloud-Native Security. Cloud Security Gaps Cloud-Native Security www.aporeto.com Why Network Segmentation is Failing Your Cloud Application Security How to Achieve Effective Application Segmentation By now it s obvious to security-minded

More information

Install your scientific software stack easily with Spack

Install your scientific software stack easily with Spack Install your scientific software stack easily with Spack Les mardis du développement technologique Florent Pruvost (SED) Outline 1. Context 2. Features overview 3. In practice 4. Some feedback Florent

More information

Principles of System Administration. Jan Schaumann

Principles of System Administration. Jan Schaumann Principles of System Administration Jan Schaumann jschauma@netmeister.org February 6, 2018 Contents Preface to the preface Preface i ii I Introduction and General Concepts 1 1 Introduction 2 1.1 What exactly

More information

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

MATE-EC2: A Middleware for Processing Data with Amazon Web Services MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering

More information

Storage Optimization with Oracle Database 11g

Storage Optimization with Oracle Database 11g Storage Optimization with Oracle Database 11g Terabytes of Data Reduce Storage Costs by Factor of 10x Data Growth Continues to Outpace Budget Growth Rate of Database Growth 1000 800 600 400 200 1998 2000

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

Perform scalable data exchange using InfoSphere DataStage DB2 Connector

Perform scalable data exchange using InfoSphere DataStage DB2 Connector Perform scalable data exchange using InfoSphere DataStage Angelia Song (azsong@us.ibm.com) Technical Consultant IBM 13 August 2015 Brian Caufield (bcaufiel@us.ibm.com) Software Architect IBM Fan Ding (fding@us.ibm.com)

More information

Ceph: A Scalable, High-Performance Distributed File System

Ceph: A Scalable, High-Performance Distributed File System Ceph: A Scalable, High-Performance Distributed File System S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long Presented by Philip Snowberger Department of Computer Science and Engineering University

More information

Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Copyright 2011, Oracle and/or its affiliates. All rights reserved. The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material,

More information

SoftNAS Cloud Performance Evaluation on Microsoft Azure

SoftNAS Cloud Performance Evaluation on Microsoft Azure SoftNAS Cloud Performance Evaluation on Microsoft Azure November 30, 2016 Contents SoftNAS Cloud Overview... 3 Introduction... 3 Executive Summary... 4 Key Findings for Azure:... 5 Test Methodology...

More information

Introduction to the Active Everywhere Database

Introduction to the Active Everywhere Database Introduction to the Active Everywhere Database INTRODUCTION For almost half a century, the relational database management system (RDBMS) has been the dominant model for database management. This more than

More information

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures Min Li, Sudharshan S. Vazhkudai, Ali R. Butt, Fei Meng, Xiaosong Ma, Youngjae Kim,Christian Engelmann, and Galen Shipman

More information

Overview of MOSIX. Prof. Amnon Barak Computer Science Department The Hebrew University.

Overview of MOSIX. Prof. Amnon Barak Computer Science Department The Hebrew University. Overview of MOSIX Prof. Amnon Barak Computer Science Department The Hebrew University http:// www.mosix.org Copyright 2006-2017. All rights reserved. 1 Background Clusters and multi-cluster private Clouds

More information

Two-Choice Randomized Dynamic I/O Scheduler for Object Storage Systems. Dong Dai, Yong Chen, Dries Kimpe, and Robert Ross

Two-Choice Randomized Dynamic I/O Scheduler for Object Storage Systems. Dong Dai, Yong Chen, Dries Kimpe, and Robert Ross Two-Choice Randomized Dynamic I/O Scheduler for Object Storage Systems Dong Dai, Yong Chen, Dries Kimpe, and Robert Ross Parallel Object Storage Many HPC systems utilize object storage: PVFS, Lustre, PanFS,

More information

FlashStack 70TB Solution with Cisco UCS and Pure Storage FlashArray

FlashStack 70TB Solution with Cisco UCS and Pure Storage FlashArray REFERENCE ARCHITECTURE Microsoft SQL Server 2016 Data Warehouse Fast Track FlashStack 70TB Solution with Cisco UCS and Pure Storage FlashArray FLASHSTACK REFERENCE ARCHITECTURE December 2017 TABLE OF CONTENTS

More information

ViewBox. Integrating Local File Systems with Cloud Storage Services. Yupu Zhang +, Chris Dragga + *, Andrea Arpaci-Dusseau +, Remzi Arpaci-Dusseau +

ViewBox. Integrating Local File Systems with Cloud Storage Services. Yupu Zhang +, Chris Dragga + *, Andrea Arpaci-Dusseau +, Remzi Arpaci-Dusseau + ViewBox Integrating Local File Systems with Cloud Storage Services Yupu Zhang +, Chris Dragga + *, Andrea Arpaci-Dusseau +, Remzi Arpaci-Dusseau + + University of Wisconsin Madison *NetApp, Inc. 5/16/2014

More information

VMware vsphere with ESX 4.1 and vcenter 4.1

VMware vsphere with ESX 4.1 and vcenter 4.1 QWERTYUIOP{ Overview VMware vsphere with ESX 4.1 and vcenter 4.1 This powerful 5-day class is an intense introduction to virtualization using VMware s vsphere 4.1 including VMware ESX 4.1 and vcenter.

More information

A FLEXIBLE ARM-BASED CEPH SOLUTION

A FLEXIBLE ARM-BASED CEPH SOLUTION A FLEXIBLE ARM-BASED CEPH SOLUTION Ceph in a nutshell A software-defined storage platform. Ceph is a software-defined storage platform (SDS). SDS mean it s a solution that relies on Software Intelligence

More information

SCHISM: A WORKLOAD-DRIVEN APPROACH TO DATABASE REPLICATION AND PARTITIONING

SCHISM: A WORKLOAD-DRIVEN APPROACH TO DATABASE REPLICATION AND PARTITIONING SCHISM: A WORKLOAD-DRIVEN APPROACH TO DATABASE REPLICATION AND PARTITIONING ZEYNEP KORKMAZ CS742 - PARALLEL AND DISTRIBUTED DATABASE SYSTEMS UNIVERSITY OF WATERLOO OUTLINE. Background 2. What is Schism?

More information