Tackling the Reproducibility Problem in Systems Research with Declarative Experiment Specifications

Similar documents
Tackling the Reproducibility Problem in Systems Research with Declarative Experiment Specifications

SolidFire and Ceph Architectural Comparison

Curriculum Vitae. April 5, 2018

Securing Grid Data Transfer Services with Active Network Portals

Malacology. A Programmable Storage System [Sevilla et al. EuroSys '17]

Heckaton. SQL Server's Memory Optimized OLTP Engine

CSE 120 Principles of Operating Systems

Enosis: Bridging the Semantic Gap between

Graphene-SGX. A Practical Library OS for Unmodified Applications on SGX. Chia-Che Tsai Donald E. Porter Mona Vij

Securing Grid Data Transfer Services with Active Network Portals

Designing a True Direct-Access File System with DevFS

Containerizing GPU Applications with Docker for Scaling to the Cloud

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

Data Centers and Cloud Computing

Storage in HPC: Scalable Scientific Data Management. Carlos Maltzahn IEEE Cluster 2011 Storage in HPC Panel 9/29/11

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

Data Centers and Cloud Computing. Data Centers

Essential Features of an Integration Solution

Mobile Offloading. Matti Kemppainen

A Generic Methodology of Analyzing Performance Bottlenecks of HPC Storage Systems. Zhiqi Tao, Sr. System Engineer Lugano, March

Database Replication in Tashkent. CSEP 545 Transaction Processing Sameh Elnikety

CS 261 Fall Mike Lam, Professor. Virtual Memory

-Presented By : Rajeshwari Chatterjee Professor-Andrey Shevel Course: Computing Clusters Grid and Clouds ITMO University, St.

Was ist dran an einer spezialisierten Data Warehousing platform?

Top Trends in DBMS & DW

Chapter 7 Forensic Duplication

Emulating Goliath Storage Systems with David

Choosing an Interface

Virtual Switch Acceleration with OVS-TC

Efficient, Scalable, and Provenance-Aware Management of Linked Data

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

Tutorial: Analyzing MPI Applications. Intel Trace Analyzer and Collector Intel VTune Amplifier XE

Slacker: Fast Distribution with Lazy Docker Containers. Tyler Harter, Brandon Salmon, Rose Liu, Andrea C. Arpaci-Dusseau, Remzi H.

Flash vs. Disk Storage: Testing Workloads is Key

Map-Reduce. Marco Mura 2010 March, 31th

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

Microsoft RemoteFX for Remote Desktop Virtualization Host Capacity Planning Guide for Windows Server 2008 R2 Service Pack 1

NVMe From The Server Perspective

3.4 Data-Centric workflow

Database Acceleration Solution Using FPGAs and Integrated Flash Storage

Extremely Fast Distributed Storage for Cloud Service Providers

April Copyright 2013 Cloudera Inc. All rights reserved.

Presented by: Nafiseh Mahmoudi Spring 2017

VARIABILITY IN OPERATING SYSTEMS

Tools for Social Networking Infrastructures

MDHIM: A Parallel Key/Value Store Framework for HPC

Advanced Systems Lab (Intro and Administration) G. Alonso Systems Group

Generating Realistic Impressions for File-System Benchmarking. Nitin Agrawal Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau

REFERENCE ARCHITECTURE Microsoft SQL Server 2016 Data Warehouse Fast Track. FlashStack 70TB Solution with Cisco UCS and Pure Storage FlashArray//X

TCC, so your business continues

Chapter 7 Forensic Duplication

Cloudian Sizing and Architecture Guidelines

CLIP: A Compact, Load-balancing Index Placement Function

McRT-STM: A High Performance Software Transactional Memory System for a Multi- Core Runtime

Nomad: Mitigating Arbitrary Cloud Side Channels via Provider-Assisted Migration

The Fusion Distributed File System

Practical Near-Data Processing for In-Memory Analytics Frameworks

White Paper. Major Performance Tuning Considerations for Weblogic Server

20762B: DEVELOPING SQL DATABASES

Physical Disentanglement in a Container-Based File System

University of Wisconsin-Madison

Albis: High-Performance File Format for Big Data Systems

The Leading Parallel Cluster File System

Data Sharing Made Easier through Programmable Metadata. University of Wisconsin-Madison

Towards Efficient, Portable Application-Level Consistency

Continuous Performance Testing. Mark Price Performance Engineer Improbable.io

MOHA: Many-Task Computing Framework on Hadoop

SILT: A Memory-Efficient, High- Performance Key-Value Store

Microsoft. [MS20762]: Developing SQL Databases

Real-time Protection for Microsoft Hyper-V

"Charting the Course... MOC C: Developing SQL Databases. Course Summary

OS Virtualization. Why Virtualize? Introduction. Virtualization Basics 12/10/2012. Motivation. Types of Virtualization.

Evaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

Zettabyte Reliability with Flexible End-to-end Data Integrity

IBM B2B INTEGRATOR BENCHMARKING IN THE SOFTLAYER ENVIRONMENT

Yiying Zhang, Leo Prasath Arulraj, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. University of Wisconsin - Madison

Tradeoff Evaluation. Comparison between CB-R and CB-A as they only differ for this aspect.

Harp-DAAL for High Performance Big Data Computing

Practical MySQL Performance Optimization. Peter Zaitsev, CEO, Percona July 20 th, 2016 Percona Technical Webinars

Hedvig as backup target for Veeam

Cloud Security Gaps. Cloud-Native Security.

Install your scientific software stack easily with Spack

Principles of System Administration. Jan Schaumann

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

Storage Optimization with Oracle Database 11g

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Perform scalable data exchange using InfoSphere DataStage DB2 Connector

Ceph: A Scalable, High-Performance Distributed File System

Copyright 2011, Oracle and/or its affiliates. All rights reserved.

SoftNAS Cloud Performance Evaluation on Microsoft Azure

Introduction to the Active Everywhere Database

Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures

Overview of MOSIX. Prof. Amnon Barak Computer Science Department The Hebrew University.

Two-Choice Randomized Dynamic I/O Scheduler for Object Storage Systems. Dong Dai, Yong Chen, Dries Kimpe, and Robert Ross

FlashStack 70TB Solution with Cisco UCS and Pure Storage FlashArray

ViewBox. Integrating Local File Systems with Cloud Storage Services. Yupu Zhang +, Chris Dragga + *, Andrea Arpaci-Dusseau +, Remzi Arpaci-Dusseau +

VMware vsphere with ESX 4.1 and vcenter 4.1

A FLEXIBLE ARM-BASED CEPH SOLUTION

SCHISM: A WORKLOAD-DRIVEN APPROACH TO DATABASE REPLICATION AND PARTITIONING

Transcription:

Tackling the Reproducibility Problem in Systems Research with Declarative Experiment Specifications Ivo Jimenez, Carlos Maltzahn (UCSC) Adam Moody, Kathryn Mohror (LLNL) Jay Lofstead (Sandia) Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau (UWM)

The Reproducibility Problem Network Disks BIOS OS conf. Magic numbers Workload Jitter etc... Throughput (MB/s) 140 Reproduced? Original 120 100 80 60 40 Goal: define methodology so that 20 we don t end up in this situation 1 2 3 4 5 6 7 8 9 10 11 12 13 Cluster size 2

Outline Re-execution vs. validation Declarative Experiment Specification (ESF) Case Study Benefits & Challenges 3

Outline Re-execution vs. validation Declarative Experiment Specification (ESF) Case Study Benefits & Challenges 4

Reproducibility Workflow 1. Re-execute experiment Recreate original setup, re-execute experiments Technical task 2. Validate results Compare against original A subjective task How do we express objective validation criteria? What contextual information to include with results? 5

Experiment Goal: Show that my algorithm/system/etc. is better than the state-of-the-art. Figure Raw data libs data Means code, of Experiment workload hardware OS Observations 6

Outline Re-execution vs. validation Declarative Experiment Specification (ESF) Case Study Benefits & Challenges 7

Experiment Goal: Show that my algorithm/system/etc. is better than the state-of-the-art. Means of Experiment 8

Validation Language Syntax validation : 'for' condition ('and' condition)* 'expect' result ('and' result)* ; condition : vars ('in' range ('=' '<' '>' '!=') value) ; result : condition ; vars : var (',' var)* ; range : '[' range_num (',' range_num)* ']' ; range_num : NUMBER '-' NUMBER '*' ; value : '*' 'NUMBER (',' NUMBER)* ; 9

Outline Re-execution vs. validation Declarative Experiment Specification (ESF) Case Study Benefits & Challenges 10

Ceph OSDI 06 Select scalability experiment. Distributed; makes use of all resources Main bottlenecks: I/O and network Why this experiment? Top conference 10 year old experiment Ideal reproducibility conditions Access to authors, topic familiarity, same hardware, Even in an ideal scenario, we s:ll struggle Demonstrates which missing info is captured by an ESF! 11

Ceph Schema Validation OSDI of Statement Experiment 06 Scalability Output Experiment Data Per-OSD Average Throughput (MB/s) for cluster_size = <= * 24 and expect not net_saturated expect ceph >= 55 (raw mb/s *.90) ceph >= (raw *.90) 60 50 40 30 "independent_variables": [{ "type": cluster_size, "values": 2-28 }], },{ { "dependent_variable": "type": "method", { "type": "values":"throughput", ["raw", "ceph"] }], },{"scale": "mb/s" }, "dependent_variable": "type": net_saturated", { "type": "values": "throughput", [ true", false"] }], "scale": "mb/s" }, "dependent_variable": { "type": "throughput", "scale": "mb/s" }, 2 6 10 14 18 22 26 12

Normalized Per-OSD Throughput 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 reproduced original 1 2 3 4 5 6 7 8 9 10 11 12 OSD Cluster Size...... 24 26 13

Benefits & Challenges 14

Why care about Reproducibility? Good enough is not an excuse We can always improve the state of our practice How do we compare hardware/software in a scientific way? Experimental Cloud Infrastructure PRObE / CloudLab / Chameleon Having reproducible / validated experiments would represent a significant step toward embodying the scientific method as a core component of these infrastructures 15

Benefits of ESF-based methodology Brings falsibiability to our field Statements can be proven false Automate validation Validation becomes an objective task 16

Validation Workflow Obtain/ recreate means of experiment. Re-run and check validation clauses against output. Any validation failed? no Original work findings are corroborated yes Any significant differences between original and recreated means? yes Update means of experiment no Cannot validate original claims 17

Benefits of ESF-based methodology Brings falsibiability to our field Statements can be proven false Automate validation Validation becomes an objective task Usability We all do this anyway, albeit in an ad-hoc way Integrate into existing infrastructure 18

Integration with Existing Infrastructure pull push code and ESF Test: - Unit - Integration - Validations 19

Challenges Reproduce every time Include sanity checks as part of experiment Alternative: corroborate that network/disk observes expected behavior at runtime Reproduce everywhere Example: GCC s flags, 10 806 combinations Alternative: provide image of complete software stack (e.g. linux containers) 20

Conclusion ESFs: Embody all components of an experiment Enable automation of result validation Brings us closer to the scientific method Our ideal future: Researchers use ESFs to express an hypothesis Toolkits for ESFs produce metadata-rich figures Machine-readable evaluation section https://github.com/systemslab/esf 21

Thanks! 22

SILT Experiment Schema Validations SOSP 11 Goal for workload=* expect cuckoo > raw and trie > raw for lookup expect cuckoo > trie and for individual and bulk expect cuckoo > trie { The high "type": random method, read speed of flash drives means that the CPU }, budget { available for each index "type": "workload", operation "values": is relatively [ limited. This microbenchmark ] demonstrates }, that SILT s indexes "dependent_variable": { meet their "type": design "throughput", goal of "scale": bytes/second" computation-efficient indexing. } "values": [ raw, "cuckoo", "trie"] "individual", "bulk", "lookup 23

Geneiatakis et. al. CCS 12 24

Experiment Goal In this section, our goal is to evaluate the performance benefits that can be reaped, by utilizing virtual partitioning to apply otherwise expensive protection mechanisms on the most exposed part of applications. This allows us to strike a balance between the overhead imposed on the application and its exposure to attacks. 25

Experiment Goal In this section, our goal is to evaluate the performance benefits that can be reaped, by utilizing virtual partitioning to apply otherwise expensive protection mechanisms on the most exposed part of applications. This allows us to strike a balance between the overhead imposed on the application and its exposure to attacks. 26

Schema 27

Schema "independent_variables": [ { "type": method", "alias": [ technique ], "values": [ native", pin", isr, dta, dta_pin, dta_isr ] } ], "dependent_variable": { "type": runtime", "scale": s" }, 28

Validations "independent_variables": [ { "type": method", "alias": [ technique ], "values": [ native", pin", isr, dta, dta_pin, dta_isr ] } ], "dependent_variable": { "type": runtime", "scale": s" }, 29

Validations "independent_variables": [ { "type": method", expect native < any "alias": [ technique ], "values": [ native", pin", isr, dta, dta_pin, dta_isr ] } ], "dependent_variable": { "type": runtime", "scale": s" }, 30

Validations "independent_variables": [ { "type": method", expect native < any and "alias": [ technique ], "values": [ native", pin", isr, dta, dta_pin, dta_isr ] } ], "dependent_variable": { "type": runtime", "scale": s" }, 31

Validations "independent_variables": [ { "type": method", expect native < any and dta_pin between pin and isr "alias": [ technique ], "values": [ native", pin", isr, dta, dta_pin, dta_isr ] } ], "dependent_variable": { "type": runtime", "scale": s" }, 32

Validations "independent_variables": [ { "type": method", expect native < any and dta_pin between pin and isr and "alias": [ technique ], "values": [ native", pin", isr, dta, dta_pin, dta_isr ] } ], "dependent_variable": { "type": runtime", "scale": s" }, 33

Validations "independent_variables": [ { "type": method", expect native < any and dta_pin between pin and isr and dta_isr between isr and dta "alias": [ technique ], "values": [ native", pin", isr, dta, dta_pin, dta_isr ] } ], "dependent_variable": { "type": runtime", "scale": s" }, 34

Example 2 35

Example 2 36

Schema "independent_variables": [ { "type": method", "values": [ native", pin", isr, dta, dta_pin, dta_isr ] } ], "dependent_variable": { "type": runtime", "scale": s" }, 37

Schema "independent_variables": [ { "type": method, "values": [ native", pin", isr, dta, dta_pin, dta_isr ] }, { "type": workload, "values": [ ftp", samba", ssh ] } ], "dependent_variable": { "type": runtime", "scale": s" }, 38

Validations "independent_variables": [ { for "type": method, workload=* expect native < any and dta_pin between pin and isr and dta_isr between isr and dta "values": [ native", pin", isr, dta, dta_pin, dta_isr ] }, { "type": workload, "values": [ ftp", samba", ssh ] } ], "dependent_variable": { "type": runtime", "scale": s" }, 39

Falsifiability in Science Falsibiability of a statement, hypothesis, or theory is an inherent possibility to prove it to be false. In other words, the ability to specify the conditions under which a statement is false Synonymous to Testability Example: Statement: All swans are white Falsifiable: Observe one black swan source: en.wikipedia.org/wiki/falsifiability 40

41

Falsifiability in Systems Experiment Goal: Show that my algorithm/system/etc. is better than the state-of-the-art. Figure Raw data libs data Means code, of Experiment workload hardware OS Observations 42

Falsifiability in Systems To falsify a claim: Describe the means of the experiments Provide validation statements over the output data Conditional statement: if means are properly recreated then validation statements should hold Go from inert observations to falsifiable statements From: We observe that our system outperforms the alternatives To: Expect 25-30% performance improvement on hardware platform X, on workload Y, when configured like Z 43

Early Feedback Creating an ESF helps authors to: Find meaningful/reproducible baselines Create a feedback loop in author s mind Specify exactly what author means Make temporal context explicit 44