Characterizing Result Errors in Internet Desktop Grids

Size: px
Start display at page:

Download "Characterizing Result Errors in Internet Desktop Grids"

Transcription

1 Characterizing Result Errors in Internet Desktop Grids D. Kondo 1, F. Araujo 2, P. Malecot 1, P. Domingues 2, L. Silva 2, G. Fedak 1, F. Cappello 1 1 INRIA, France 2 University of Coimbra, Portugal

2 Desktop Grids Astronomy Math Biology LIP IBM AFM Total: ~50 applications using ~1.1 PetaFLOPS from ~1 million active resources

3 Background In large-scale desktop grids involving volunteered, anonymous (and thereby potentially untrusted, insecure) resources, errors are inevitable Software/Hardware Stack Potential Source of Error Application Middleware OS Hardware (Disk, CPU, Memory, Network)

4 Background In large-scale desktop grids involving volunteered, anonymous (and thereby potentially untrusted, insecure) resources, errors are inevitable Software/Hardware Stack Application Potential Source of Error Modify application results Middleware OS Hardware (Disk, CPU, Memory, Network)

5 Background In large-scale desktop grids involving volunteered, anonymous (and thereby potentially untrusted, insecure) resources, errors are inevitable Software/Hardware Stack Application Middleware Potential Source of Error Modify application results Revise and recompile middleware OS Hardware (Disk, CPU, Memory, Network)

6 Background In large-scale desktop grids involving volunteered, anonymous (and thereby potentially untrusted, insecure) resources, errors are inevitable Software/Hardware Stack Application Middleware OS Potential Source of Error Modify application results Revise and recompile middleware Viruses Hardware (Disk, CPU, Memory, Network)

7 Background In large-scale desktop grids involving volunteered, anonymous (and thereby potentially untrusted, insecure) resources, errors are inevitable Software/Hardware Stack Application Middleware OS Hardware (Disk, CPU, Memory, Network) Potential Source of Error Modify application results Revise and recompile middleware Viruses Disk crash, overclocking and overheating of CPU

8 Motivation Number of application-level mechanisms for tolerating errors exist [Sarmenta, Lo] Effectiveness of mechanisms depend on when errors in real systems Yet, characterization of errors is poorly understood

9 Goal Characterize error rates in a real system Frequency Stationarity Correlation Evaluate error tolerance mechanisms in light of this characterization

10 Outline Background Terminology Related Work Method Error Characterization Summary and Future Work

11 Background Terminology server workers

12 Background Terminology server workers workunit download

13 Background Terminology server workers

14 Background Terminology server (correct or erroneous) result upload workers

15 Related Work Error Tolerance Mechanisms [Sarmenta01, Zhao01, Taufer05] Majority voting Spot-checking with blacklisting Credibility-based methods

16 Majority Voting [Sarmenta01] Send 2m-1 instances of the same workunit to multiple workers, and the compare the results Majority vote is complete after receiving m identical results

17 Majority Voting [Sarmenta01] ε ϕ m Fraction of results that will be erroneous Probability that a worker (from the set of erroneous and nonerroneous hosts) returns an erroneous result Number of identical results before a vote is considered to be complete ε majv (ϕ, m) = 2m 1 j=m ( 2m 1 j ) ϕ j (1 ϕ) 2m 1 j

18 Issues Model assumes error rates are not correlated among hosts If error rate is high (>1%), much redundancy required to achieve low error bounds

19 Spot-Checking [Sarmenta01] Distribute workunit with known correct result randomly to workers Compare workers result to known correct result If there is a difference, blacklist that worker

20 Spot-Checking [Sarmenta01] ε q n f s Fraction of results that will be erroneous Frequency of spot-checking Number of workunits to be computed by each worker Fraction of hosts that commit at least one error Error rate per erroneous host ε scbl (q, n, f, s) = sf(1 qs) n (1 f) + f(1 qs) n

21 Issues Assumes blacklisting is efficient and effective Assumes consistency of error rates over time If error rates are low, then the number (n) of workunits to be computed per worker must very high

22 Credibility-Based System [Sarmenta01] Define credibility of an entity as the conditional probability of its correctness given its history of past (spot-)checks Workers build (or lose) credibility as they pass or fail (spot-)checks Compute credibility of result based on worker credibility Issue: assumes the error rate per host is consistent over time

23 Methodology XtremLab: BOINC-based project for characterizing Internet desktop grids Application continuously computes floatingpoint and integer operations Validator conducts syntactical and semantic checks of results Gathered data from about 600 hosts between April - July, 2006

24 Observations and Assumptions Most errors manifest themselves as scrambled or truncated output Likely due to I/O errors Detected errors would have caused a result error in a real application E.g. I/O error corresponds to a corrupt write of checkpoint file

25 Error Rates in Entire Platform "!)- ><05A(!)!!##" >0'A(!)!-,"$!)"!! " # $ % &./012345(46(74/89532:(732;(<//4/: '("!!$

26 Error Rates in Entire Platform "!)- ><05A(!)!!##" >0'A(!)!-,"$ Errors are widespread: ~35% of hosts are erroneous!)"!! " # $ % &./012345(46(74/89532:(732;(<//4/: '("!!$

27 Implications Working example 10 batches, 100 workunits each!overall! 0.01 need!result! 1"10-5 To get!result! 1"10-5 Majority vote: need majority vote (m) of 2 Spot-checking: number of workunits (n) per worker > 5300 * Blacklisting all erroneous hosts is most likely not efficient * q=0.10, f=0.35, s=0.003

28 Cumulative Error Rates and Effect on Throughput Cumulative fraction of errors error throughput Cumulative Fraction of Valid Throughput Fraction of sorted erroneous hosts

29 Cumulative Error Rates and Effect on Throughput Cumulative fraction of errors Error rates skewed. Top 10% produce 70% of errors error throughput Cumulative Fraction of Valid Throughput Fraction of sorted erroneous hosts

30 Cumulative Error Rates and Effect on Throughput Cumulative fraction of errors Error rates skewed. Top 10% produce 70% of errors Blacklisting all hosts not efficient. Would reduce throughput by 40% error throughput Cumulative Fraction of Valid Throughput Fraction of sorted erroneous hosts

31 Cumulative Error Rates and Effect on Throughput Cumulative fraction of errors Error rates skewed. Top 10% produce 70% of errors Blacklisting all hosts not efficient. Would reduce throughput by 40% error throughput Cumulative Fraction of Valid Throughput Fraction of sorted erroneous hosts

32 Spot-Checking with Blacklisting Revisited #!!$ #!!' <--/-.-=5, #!!& #!!" #!!% "!! #!!! #"!! $!!! ()*+,-./0.1/-2)3456.7,-.1/-2,-.-,8)4-,9.:3;

33 Spot-Checking with Blacklisting Revisited #!!$ #!!' Spot-checking acts as low-pass filter, reducing error rates to 2 x 10-4 <--/-.-=5, #!!& #!!" #!!% "!! #!!! #"!! $!!! ()*+,-./0.1/-2)3456.7,-.1/-2,-.-,8)4-,9.:3;

34 Majority Voting Revisited &!! &!!' <--/-.-64, &!!&! &!!&' &!!"! &!!"'! " # $ % &! ()*+,-./0.12, ,8)748.-,9)1-,2.:*;

35 Majority Voting Revisited &!! &!!' Error rate decreases exponentially, quickly below 1x10-5 <--/-.-64, &!!&! &!!&' &!!"! &!!"'! " # $ % &! ()*+,-./0.12, ,8)748.-,9)1-,2.:*;

36 Implications To get!result down to 2 x10-4 Spot-checking is a possibility: most benefit when n is [0,1000] To get!result! 2 x10-4 Use majority voting as!result exponentially decreases with m

37 Error Rate Stationarity A process is stationary if its statistical properties do not change over time Determine how stationary mean of host error rate (s) is over time Determine change in mean error rates over 96-hour periods for each host

38 Statistics for Host Error Rates over 96-hour periods Statistic Host Group µ σ σ/µ All erroneous Top 10% erroneous Bottom 90% erroneous Only about 10% of the error rates were within 25% of the mean

39 Implications Spot-checking and credibility-based systems may have limited effectiveness Both depend on the consistency of error rates over time Host with low error rate could build high credibility, and then triple its error rates

40 Correlation of Error Rates Determine independence of error on one host with that on another Independence: P(A and B) = P(A)*P(B) Determine empirical joint probability that any two hosts have error simultaneously Computed theoretical probability of two hosts from error simultaneously If error rates are not positively correlated P(A)*P(B) - P(A and B) # 0 theoretical - empirical # 0 } } P(A and B) P(A)*P(B)

41 Pairwise Host Error Rates # $'-?@;@872/A4&05762/19 $', $'+ $'* $') $'( $'! B5762/19&C&$D&$'$#((! B5762/19&E&$D&$'-,))+ $'" $'# $!!!"!# $ # "!./00&10& /678&79:&4;</5/678&<7/5=/>4&45515&5724> %&#$!!

42 Pairwise Host Error Rates # $'-?@;@872/A4&05762/19 $', $'+ $'* $') $'( $'! $'" $'# B5762/19&C&$D&$'$#((! B5762/19&E&$D&$'-,))+ Most host errors not positively correlated. Implication: majority voting likely effective in real systems $!!!"!# $ # "!./00&10& /678&79:&4;</5/678&<7/5=/>4&45515&5724> %&#$!!

43 Summary of Characterization Results A significant fraction of hosts (about 35%) will commit at least a single error over time The mean error rate over all hosts (0.0022) is quite low A large fraction of errors (0.70) result from a small fraction of hosts (0.10) Error rates over time vary greatly (as much 3.48 times) Error rates between two hosts often seem uncorrelated (more than of hosts do not have positively correlated errors)

44 Summary of Implications If one can afford redundancy or one needs an error rate to be less then 2 " 10-4, then majority voting should be considered If one can afford an error rate greater then 2 x 10-4 and can make batches relatively long, spotchecking with blacklisting should be considered Fluctuations in error rates over time may limit the effectiveness of spot-checking and crediblility-based systems

45 Future Work Use of synthetic application Important to have application regularity (I/O, computation) Not that different from real desktop grid applications (cannot be obtrusive) Compute-intensive, small-memory footprint, light periodic I/O for application-level checkpoints Characterize and run real desktop grid applications Profile applications Execute workunits representative from each profile

46 Thank you

Characterizing Result Errors in Internet Desktop Grids

Characterizing Result Errors in Internet Desktop Grids Characterizing Result Errors in Internet Desktop Grids Derrick Kondo 1, Filipe Araujo 2, Paul Malecot 1, Patricio Domingues 3, Luis Moura Silva 2, Gilles Fedak 1, and Franck Cappello 1 1 INRIA Futurs,

More information

Modeling and Tolerating Heterogeneous Failures in Large Parallel Systems

Modeling and Tolerating Heterogeneous Failures in Large Parallel Systems Modeling and Tolerating Heterogeneous Failures in Large Parallel Systems Eric Heien 1, Derrick Kondo 1, Ana Gainaru 2, Dan LaPine 2, Bill Kramer 2, Franck Cappello 1, 2 1 INRIA, France 2 UIUC, USA Context

More information

Virtualization for Desktop Grid Clients

Virtualization for Desktop Grid Clients Virtualization for Desktop Grid Clients Marosi Attila Csaba atisu@sztaki.hu BOINC Workshop 09, Barcelona, Spain, 23/10/2009 Using Virtual Machines in Desktop Grid Clients for Application Sandboxing! Joint

More information

VIAF: Verification-based Integrity Assurance Framework for MapReduce. YongzhiWang, JinpengWei

VIAF: Verification-based Integrity Assurance Framework for MapReduce. YongzhiWang, JinpengWei VIAF: Verification-based Integrity Assurance Framework for MapReduce YongzhiWang, JinpengWei MapReduce in Brief Satisfying the demand for large scale data processing It is a parallel programming model

More information

BANDWIDTH MODELING IN LARGE DISTRIBUTED SYSTEMS FOR BIG DATA APPLICATIONS

BANDWIDTH MODELING IN LARGE DISTRIBUTED SYSTEMS FOR BIG DATA APPLICATIONS BANDWIDTH MODELING IN LARGE DISTRIBUTED SYSTEMS FOR BIG DATA APPLICATIONS Bahman Javadi School of Computing, Engineering and Mathematics Western Sydney University, Australia 1 Boyu Zhang and Michela Taufer

More information

On the Scheduling of Checkpoints in Desktop Grids

On the Scheduling of Checkpoints in Desktop Grids On the Scheduling of Checkpoints in Desktop Grids Mohamed Slim Bouguerra, Derrick Kondo INRIA Rhone-Alpes Grenoble ZIRST, 51, avenue Jean Kuntzmann 3833 MONBONNOT SAINT MARTIN, France mohamed-slim.bouguerra@imag.fr;

More information

Cycle Sharing Systems

Cycle Sharing Systems Cycle Sharing Systems Jagadeesh Dyaberi Dependable Computing Systems Lab Purdue University 10/31/2005 1 Introduction Design of Program Security Communication Architecture Implementation Conclusion Outline

More information

BOINC. BOINC: A System for Public-Resource Computing and Storage David P. Anderson. Serge Koren CMSC714 November 22, 2005

BOINC. BOINC: A System for Public-Resource Computing and Storage David P. Anderson. Serge Koren CMSC714 November 22, 2005 BOINC BOINC: A System for Public-Resource Computing and Storage David P. Anderson Serge Koren CMSC714 November 22, 2005 Outline Introduction Contrast to Grid Computing BOINC Goals BOINC Project/Server

More information

On Resource Volatility in Enterprise Desktop Grids

On Resource Volatility in Enterprise Desktop Grids On Resource Volatility in Enterprise Desktop Grids Derrick Kondo Gilles Fedak Franck Cappello Andrew A. Chien 2 Henri Casanova 3 Laboratoire de Recherche en Informatique/INRIA Futurs 2 Intel Research 3

More information

The final publication is available at Springer via

The final publication is available at Springer via c IFIP International Federation for Information Processing 2017. This is the author s version of the work. It is posted here by permission of IFIP for your personal use. Not for redistribution. The final

More information

Improving Altibase Performance with Solarflare 10GbE Server Adapters and OpenOnload

Improving Altibase Performance with Solarflare 10GbE Server Adapters and OpenOnload Improving Altibase Performance with Solarflare 10GbE Server Adapters and OpenOnload Summary As today s corporations process more and more data, the business ramifications of faster and more resilient database

More information

ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability. Copyright 2010 Daniel J. Sorin Duke University

ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability. Copyright 2010 Daniel J. Sorin Duke University Advanced Computer Architecture II (Parallel Computer Architecture) Availability Copyright 2010 Daniel J. Sorin Duke University Definition and Motivation Outline General Principles of Available System Design

More information

Security issues in hierarchically connected BOINC systems

Security issues in hierarchically connected BOINC systems Security issues in hierarchically connected BOINC systems Gábor Gombás MTA SZTAKI Introduction BOINC mainly focuses on big, stand-alone, public projects At SZTAKI we're looking into

More information

Discovering Statistical Models of Availability in Large Distributed Systems: An Empirical Study of

Discovering Statistical Models of Availability in Large Distributed Systems: An Empirical Study of 1 Discovering Statistical Models of Availability in Large Distributed Systems: An Empirical Study of SETI@home Bahman Javadi, Member, IEEE, Derrick Kondo, Member, IEEE, Jean-Marc Vincent, Member, IEEE

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2008 Quiz II

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2008 Quiz II Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.830 Database Systems: Fall 2008 Quiz II There are 14 questions and 11 pages in this quiz booklet. To receive

More information

Condor and BOINC. Distributed and Volunteer Computing. Presented by Adam Bazinet

Condor and BOINC. Distributed and Volunteer Computing. Presented by Adam Bazinet Condor and BOINC Distributed and Volunteer Computing Presented by Adam Bazinet Condor Developed at the University of Wisconsin-Madison Condor is aimed at High Throughput Computing (HTC) on collections

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance Fault Tolerance Basic Concepts Being fault tolerant is strongly related to what

More information

Dynamically Estimating Reliability in a Volunteer-Based Compute and Data-Storage System

Dynamically Estimating Reliability in a Volunteer-Based Compute and Data-Storage System Dynamically Estimating Reliability in a Volunteer-Based Compute and Data-Storage System Muhammed Uluyol University of Minnesota Abstract Although cloud computing is a powerful tool for analyzing large

More information

How to speed up a database which has gotten slow

How to speed up a database which has gotten slow Triad Area, NC USA E-mail: info@geniusone.com Web: http://geniusone.com How to speed up a database which has gotten slow hardware OS database parameters Blob fields Indices table design / table contents

More information

Chapter 6 Random Number Generation

Chapter 6 Random Number Generation Chapter 6 Random Number Generation Requirements / application Pseudo-random bit generator Hardware and software solutions [NetSec/SysSec], WS 2007/2008 6.1 Requirements and Application Scenarios Security

More information

Are Disks the Dominant Contributor for Storage Failures?

Are Disks the Dominant Contributor for Storage Failures? Are Disks the Dominant Contributor for Storage Failures? A Comprehensive Study of Storage Subsystem Failure Characteristics Weihang Jiang, Chongfeng Hu, Yuanyuan Zhou, and Arkady Kanevsky Department of

More information

CSCI 204 Introduction to Computer Science II Lab 7 Queue ADT

CSCI 204 Introduction to Computer Science II Lab 7 Queue ADT CSCI 204 Introduction to Computer Science II Lab 7 Queue ADT 1. Objectives In this lab, you will practice the following: Implement the Queue ADT using a structure of your choice, e.g., array or linked

More information

Towards Scalable Data Management for Map-Reduce-based Data-Intensive Applications on Cloud and Hybrid Infrastructures

Towards Scalable Data Management for Map-Reduce-based Data-Intensive Applications on Cloud and Hybrid Infrastructures Towards Scalable Data Management for Map-Reduce-based Data-Intensive Applications on Cloud and Hybrid Infrastructures Frédéric Suter Joint work with Gabriel Antoniu, Julien Bigot, Cristophe Blanchet, Luc

More information

5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks 485.e1

5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks 485.e1 5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks 485.e1 5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks Amdahl s law in Chapter 1 reminds us that

More information

Evaluating the Impact of Client based CPU Scheduling Policies on the Application s Performance in Desktop Grid Systems

Evaluating the Impact of Client based CPU Scheduling Policies on the Application s Performance in Desktop Grid Systems 144 Evaluating the Impact of Client based CPU Scheduling Policies on the Application s Performance in Desktop Grid Systems Muhammad Khalid Khan and Danish Faiz College of Computing & Information Sciences,

More information

A Taxonomy of Desktop Grids and its Mapping to State-of-the-Art Systems

A Taxonomy of Desktop Grids and its Mapping to State-of-the-Art Systems A Taxonomy of Desktop Grids and its Mapping to State-of-the-Art Systems SUNGJIN CHOI, RAJKUMAR BUYYA University of Melbourne, Australia and HONGSOO KIM, EUNJOUNG BYUN Korea University, Korea and MAENGSOON

More information

Preview. The Thread Model Motivation of Threads Benefits of Threads Implementation of Thread

Preview. The Thread Model Motivation of Threads Benefits of Threads Implementation of Thread Preview The Thread Model Motivation of Threads Benefits of Threads Implementation of Thread Implement thread in User s Mode Implement thread in Kernel s Mode CS 431 Operating System 1 The Thread Model

More information

SOFT 437. Software Performance Analysis. Ch 7&8:Software Measurement and Instrumentation

SOFT 437. Software Performance Analysis. Ch 7&8:Software Measurement and Instrumentation SOFT 437 Software Performance Analysis Ch 7&8: Why do we need data? Data is required to calculate: Software execution model System execution model We assumed that we have required data to calculate these

More information

E-SCIENCE WORKFLOW ON THE GRID

E-SCIENCE WORKFLOW ON THE GRID E-SCIENCE WORKFLOW ON THE GRID Yaohang Li Department of Computer Science North Carolina A&T State University, Greensboro, NC 27411, USA yaohang@ncat.edu Michael Mascagni Department of Computer Science

More information

The Fusion Distributed File System

The Fusion Distributed File System Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique

More information

Analytic Performance Models for Bounded Queueing Systems

Analytic Performance Models for Bounded Queueing Systems Analytic Performance Models for Bounded Queueing Systems Praveen Krishnamurthy Roger D. Chamberlain Praveen Krishnamurthy and Roger D. Chamberlain, Analytic Performance Models for Bounded Queueing Systems,

More information

BECOME A LOAD TESTING ROCK STAR

BECOME A LOAD TESTING ROCK STAR 3 EASY STEPS TO BECOME A LOAD TESTING ROCK STAR Replicate real life conditions to improve application quality Telerik An Introduction Software load testing is generally understood to consist of exercising

More information

An Adaptive Priority Tuning System for Optimized Local CPU Scheduling using BOINC Clients

An Adaptive Priority Tuning System for Optimized Local CPU Scheduling using BOINC Clients Journal of Physics: Conference Series An Adaptive Priority Tuning System for Optimized Local CPU Scheduling using BOINC Clients To cite this article: Adel B Mnaouer and Colin Ragoonath 2010 J. Phys.: Conf.

More information

The Lattice BOINC Project Public Computing for the Tree of Life

The Lattice BOINC Project Public Computing for the Tree of Life The Lattice BOINC Project Public Computing for the Tree of Life Presented by Adam Bazinet Center for Bioinformatics and Computational Biology Institute for Advanced Computer Studies University of Maryland

More information

Towards Ensuring Collective Availability in Volatile Resource Pools via Forecasting

Towards Ensuring Collective Availability in Volatile Resource Pools via Forecasting Towards CloudComputing@home: Ensuring Collective Availability in Volatile Resource Pools via Forecasting Artur Andrzejak Berlin (ZIB) andrzejak[at]zib.de Zuse-Institute Derrick Kondo David P. Anderson

More information

QoS Management of Web Services

QoS Management of Web Services QoS Management of Web Services Zibin Zheng (Ben) Supervisor: Prof. Michael R. Lyu Department of Computer Science & Engineering The Chinese University of Hong Kong Dec. 10, 2010 Outline Introduction Web

More information

Optimizing Peer Relationships in a Super-Peer Network

Optimizing Peer Relationships in a Super-Peer Network Optimizing Peer Relationships in a Super-Peer Network Pawe l Garbacki and Dick H.J. Epema Delft University of Technology {p.j.garbacki,d.h.j.epema}@tudelft.nl Maarten van Steen Vrije Universiteit Amsterdam

More information

Modeling and Synthesizing Task Placement Constraints in Google Compute Clusters

Modeling and Synthesizing Task Placement Constraints in Google Compute Clusters Modeling and Synthesizing Task Placement s in Google s Bikash Sharma Pennsylvania State University University Park 1 bikash@cse.psu.edu Rasekh Rifaat Google Inc. Seattle 913 rasekh@google.com Victor Chudnovsky

More information

Constructing a P2P-Based High Performance Computing Platform*

Constructing a P2P-Based High Performance Computing Platform* Constructing a P2P-Based High Performance Computing Platform* Hai Jin, Fei Luo, Xiaofei Liao, Qin Zhang, and Hao Zhang Cluster and Grid Computing Laboratory, Huazhong University of Science and Technology,

More information

Scheduling a Large DataCenter

Scheduling a Large DataCenter Scheduling a Large DataCenter Cliff Stein Columbia University Google Research Monika Henzinger, Ana Radovanovic Google Research, U. Vienna Scheduling a DataCenter Companies run large datacenters Construction,

More information

Towards a Security Model to Bridge Internet Desktop Grids and Service Grids

Towards a Security Model to Bridge Internet Desktop Grids and Service Grids Towards a Security Model to Bridge Internet Desktop Grids and Service Grids Gabriel Caillat(1), Oleg Lodygensky(1), Etienne Urbah(1), Gilles Fedak(2), and Haiwu He(2) (1) Laboratoire de lʼaccelerateur

More information

Christian Benjamin Ries 1 and Christian Schröder 1. Wilhelm-Bertelsmann-Straße 10, Bielefeld, Germany. 1. Introduction

Christian Benjamin Ries 1 and Christian Schröder 1. Wilhelm-Bertelsmann-Straße 10, Bielefeld, Germany. 1. Introduction Excerpt from the Proceedings of the COMSOL Conference 2010 Paris ComsolGrid A framework for performing large-scale parameter studies using COMSOL Multiphysics and the Berkeley Open Infrastructure for Network

More information

Two-Level Dynamic Load Balancing Algorithm Using Load Thresholds and Pairwise Immigration

Two-Level Dynamic Load Balancing Algorithm Using Load Thresholds and Pairwise Immigration Two-Level Dynamic Load Balancing Algorithm Using Load Thresholds and Pairwise Immigration Hojiev Sardor Qurbonboyevich Department of IT Convergence Engineering Kumoh National Institute of Technology, Daehak-ro

More information

IT 540 Operating Systems ECE519 Advanced Operating Systems

IT 540 Operating Systems ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (3 rd Week) (Advanced) Operating Systems 3. Process Description and Control 3. Outline What Is a Process? Process

More information

Modeling Time-variant User Mobility in Wireless Mobile Networks (Time-variant Community (TVC) Model)

Modeling Time-variant User Mobility in Wireless Mobile Networks (Time-variant Community (TVC) Model) Modeling Time-variant User Mobility in Wireless Mobile Networks (Time-variant Community (TVC) Model) Wei-jen Hsu, Dept. of CISE, U. of Florida Thrasyvoulos Spyropoulos, INRIA, Sophia-Antipolis, France

More information

Modeling and Synthesizing Task Placement Constraints in Google Compute Clusters

Modeling and Synthesizing Task Placement Constraints in Google Compute Clusters Modeling and Synthesizing Task Placement s in Google s Bikash Sharma Pennsylvania State University University Park 1 bikash@cse.psu.edu Rasekh Rifaat Google Inc. Seattle 93 rasekh@google.com Victor Chudnovsky

More information

Custom execution environments in the BOINC middleware

Custom execution environments in the BOINC middleware Custom execution environments in the BOINC middleware Diogo Ferreira 1, Filipe Araujo 1, Patricio Domingues 3 1 CISUC, Dept. of Informatics Engineering, University of Coimbra, Portugal defer@student.dei.uc.pt

More information

Monte Carlo for Spatial Models

Monte Carlo for Spatial Models Monte Carlo for Spatial Models Murali Haran Department of Statistics Penn State University Penn State Computational Science Lectures April 2007 Spatial Models Lots of scientific questions involve analyzing

More information

Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham

Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham Final Report for cs229: Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham Abstract. The goal of this work is to use machine learning to understand

More information

Network Security. Random Number Generation. Chapter 6. Network Security (WS 2003): 06 Random Number Generation 1 Dr.-Ing G.

Network Security. Random Number Generation. Chapter 6. Network Security (WS 2003): 06 Random Number Generation 1 Dr.-Ing G. Network Security Chapter 6 Random Number Generation Network Security (WS 2003): 06 Random Number Generation 1 Tasks of Key Management (1) Generation: It is crucial to security, that keys are generated

More information

Fault Tolerance. Distributed Systems. September 2002

Fault Tolerance. Distributed Systems. September 2002 Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend

More information

Chapter 20: Database System Architectures

Chapter 20: Database System Architectures Chapter 20: Database System Architectures Chapter 20: Database System Architectures Centralized and Client-Server Systems Server System Architectures Parallel Systems Distributed Systems Network Types

More information

CS Project Report

CS Project Report CS7960 - Project Report Kshitij Sudan kshitij@cs.utah.edu 1 Introduction With the growth in services provided over the Internet, the amount of data processing required has grown tremendously. To satisfy

More information

Implementation Issues. Remote-Write Protocols

Implementation Issues. Remote-Write Protocols Implementation Issues Two techniques to implement consistency models Primary-based protocols Assume a primary replica for each data item Primary responsible for coordinating all writes Replicated write

More information

Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit.

Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit. Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication One-one communication One-many communication Distributed commit Two phase commit Failure recovery

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

TYPES OF OPERATING SYSTEMS. Dimple Juneja

TYPES OF OPERATING SYSTEMS. Dimple Juneja TYPES OF OPERATING SYSTEMS Outline Operating Systems Multiprogramming Systems Time-sharing Systems Multitasking Systems Operating System Architectures OS: A Usual View What is an operating system? An interface

More information

On Characteristics and Modeling of P2P Resources with Correlated Static and Dynamic Attributes

On Characteristics and Modeling of P2P Resources with Correlated Static and Dynamic Attributes Proc. IEEE GLOBECOM, Houston, TX, Dec. 2011 On Characteristics and Modeling of P2P Resources with Correlated Static and Dynamic Attributes H. M. N. Dilum Bandara and Anura P. Jayasumana Department of Electrical

More information

Buffered Co-scheduling: A New Methodology for Multitasking Parallel Jobs on Distributed Systems

Buffered Co-scheduling: A New Methodology for Multitasking Parallel Jobs on Distributed Systems National Alamos Los Laboratory Buffered Co-scheduling: A New Methodology for Multitasking Parallel Jobs on Distributed Systems Fabrizio Petrini and Wu-chun Feng {fabrizio,feng}@lanl.gov Los Alamos National

More information

Today: Fault Tolerance

Today: Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

Big picture. Definitions. Internal sorting. Exchange sorts. Insertion sort Bubble sort Selection sort Comparison. Comp Sci 1575 Data Structures

Big picture. Definitions. Internal sorting. Exchange sorts. Insertion sort Bubble sort Selection sort Comparison. Comp Sci 1575 Data Structures Internal sorting Comp Sci 1575 Data Structures Admin notes Advising appointments will eclipse office hours this week, so no guarantees about availability during normal times. With 130 appointments at 15

More information

Index. ADEPT (tool for modelling proposed systerns),

Index. ADEPT (tool for modelling proposed systerns), Index A, see Arrivals Abstraction in modelling, 20-22, 217 Accumulated time in system ( w), 42 Accuracy of models, 14, 16, see also Separable models, robustness Active customer (memory constrained system),

More information

The Dangers and Complexities of SQLite Benchmarking. Dhathri Purohith, Jayashree Mohan and Vijay Chidambaram

The Dangers and Complexities of SQLite Benchmarking. Dhathri Purohith, Jayashree Mohan and Vijay Chidambaram The Dangers and Complexities of SQLite Benchmarking Dhathri Purohith, Jayashree Mohan and Vijay Chidambaram 2 3 Benchmarking SQLite is Non-trivial! Benchmarking complex systems in a repeatable fashion

More information

SELF-HEALING NETWORKS: REDUNDANCY AND STRUCTURE

SELF-HEALING NETWORKS: REDUNDANCY AND STRUCTURE SELF-HEALING NETWORKS: REDUNDANCY AND STRUCTURE Guido Caldarelli IMT, CNR-ISC and LIMS, London UK DTRA Grant HDTRA1-11-1-0048 INTRODUCTION The robustness and the shape Baran, P. On distributed Communications

More information

CS 578 Software Architectures Fall 2014 Homework Assignment #1 Due: Wednesday, September 24, 2014 see course website for submission details

CS 578 Software Architectures Fall 2014 Homework Assignment #1 Due: Wednesday, September 24, 2014 see course website for submission details CS 578 Software Architectures Fall 2014 Homework Assignment #1 Due: Wednesday, September 24, 2014 see course website for submission details The Berkeley Open Infrastructure for Network Computing (BOINC)

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

Today: Fault Tolerance. Replica Management

Today: Fault Tolerance. Replica Management Today: Fault Tolerance Failure models Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery

More information

Analysis of Program Behavior

Analysis of Program Behavior Analysis of Program Behavior High Performance Computing, Visualization Lucas Mello Schnorr probably soon (LIG-CNRS INF-UFRGS) 2 nd LICIA Workshop Grenoble, France September 5th, 2012 1/ 25 Introduction

More information

Estimation of MPI Application Performance on Volunteer Environments

Estimation of MPI Application Performance on Volunteer Environments Estimation of MPI Application Performance on Volunteer Environments Girish Nandagudi 1, Jaspal Subhlok 1, Edgar Gabriel 1, and Judit Gimenez 2 1 Department of Computer Science, University of Houston, {jaspal,

More information

Software Error Correction Support Policy

Software Error Correction Support Policy Software Error Correction Support Policy Oracle Enterprise Performance Management Version 1.0 Revised: January 9, 2015 Applies to: Oracle Enterprise Performance Management (Includes Hyperion) Table of

More information

A Crash Course In Wide Area Data Replication. Jacob Farmer, CTO, Cambridge Computer

A Crash Course In Wide Area Data Replication. Jacob Farmer, CTO, Cambridge Computer A Crash Course In Wide Area Data Replication Jacob Farmer, CTO, Cambridge Computer SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals

More information

ibench: Quantifying Interference in Datacenter Applications

ibench: Quantifying Interference in Datacenter Applications ibench: Quantifying Interference in Datacenter Applications Christina Delimitrou and Christos Kozyrakis Stanford University IISWC September 23 th 2013 Executive Summary Problem: Increasing utilization

More information

Direct Methods in Visual Odometry

Direct Methods in Visual Odometry Direct Methods in Visual Odometry July 24, 2017 Direct Methods in Visual Odometry July 24, 2017 1 / 47 Motivation for using Visual Odometry Wheel odometry is affected by wheel slip More accurate compared

More information

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018 Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning

More information

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision

More information

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and

More information

Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory

Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory Yixin Luo, Sriram Govindan, Bikash Sharma, Mark Santaniello, Justin Meza, Aman Kansal,

More information

Resource Usage of Windows Computer Laboratories

Resource Usage of Windows Computer Laboratories Resource Usage of Windows Computer Laboratories Patricio Domingues Paulo Marques Luis Silva ESTG Leiria Portugal Univ. Coimbra Portugal Univ. Coimbra Portugal patricio@estg.ipleiria.pt pmarques@dei.uc.pt

More information

Pimp My Data Grid. Brian Oliver Senior Principal Solutions Architect <Insert Picture Here>

Pimp My Data Grid. Brian Oliver Senior Principal Solutions Architect <Insert Picture Here> Pimp My Data Grid Brian Oliver Senior Principal Solutions Architect (brian.oliver@oracle.com) Oracle Coherence Oracle Fusion Middleware Agenda An Architectural Challenge Enter the

More information

Distributed Systems Fault Tolerance

Distributed Systems Fault Tolerance Distributed Systems Fault Tolerance [] Fault Tolerance. Basic concepts - terminology. Process resilience groups and failure masking 3. Reliable communication reliable client-server communication reliable

More information

ZooKeeper Atomic Broadcast

ZooKeeper Atomic Broadcast ZooKeeper Atomic Broadcast The heart of the ZooKeeper coordination service Benjamin Reed, Flavio Junqueira Yahoo! Research ZooKeeper Service Transforms a request into an idempotent transaction Request

More information

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant Big and Fast Anti-Caching in OLTP Systems Justin DeBrabant Online Transaction Processing transaction-oriented small footprint write-intensive 2 A bit of history 3 OLTP Through the Years relational model

More information

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona MySQL Performance Optimization and Troubleshooting with PMM Peter Zaitsev, CEO, Percona In the Presentation Practical approach to deal with some of the common MySQL Issues 2 Assumptions You re looking

More information

Direct Anonymous Attestation

Direct Anonymous Attestation Direct Anonymous Attestation Revisited Jan Camenisch IBM Research Zurich Joint work with Ernie Brickell, Liqun Chen, Manu Drivers, Anja Lehmann. jca@zurich.ibm.com, @JanCamenisch, ibm.biz/jancamenisch

More information

QoS-aware resource allocation and load-balancing in enterprise Grids using online simulation

QoS-aware resource allocation and load-balancing in enterprise Grids using online simulation QoS-aware resource allocation and load-balancing in enterprise Grids using online simulation * Universität Karlsruhe (TH) Technical University of Catalonia (UPC) Barcelona Supercomputing Center (BSC) Samuel

More information

Understanding Availability

Understanding Availability Understanding Availability Ranjita Bhagwan, Stefan Savage and Geoffrey M. Voelker Department of Computer Science and Engineering University of California, San Diego Abstract This paper addresses a simple,

More information

CDA 5140 Software Fault-tolerance. - however, reliability of the overall system is actually a product of the hardware, software, and human reliability

CDA 5140 Software Fault-tolerance. - however, reliability of the overall system is actually a product of the hardware, software, and human reliability CDA 5140 Software Fault-tolerance - so far have looked at reliability as hardware reliability - however, reliability of the overall system is actually a product of the hardware, software, and human reliability

More information

Dynamics 365. for Finance and Operations, Enterprise edition (onpremises) system requirements

Dynamics 365. for Finance and Operations, Enterprise edition (onpremises) system requirements Dynamics 365 ignite for Finance and Operations, Enterprise edition (onpremises) system requirements This document describes the various system requirements for Microsoft Dynamics 365 for Finance and Operations,

More information

Resource Estimation for Objectory Projects

Resource Estimation for Objectory Projects Resource Estimation for Objectory Projects Gustav Karner Objective Systems SF AB Torshamnsgatan 39, Box 1128 164 22 Kista email: gustav@os.se September 17, 1993 Abstract In order to estimate the resources

More information

ITERATIVE COLLISION RESOLUTION IN WIRELESS NETWORKS

ITERATIVE COLLISION RESOLUTION IN WIRELESS NETWORKS ITERATIVE COLLISION RESOLUTION IN WIRELESS NETWORKS An Undergraduate Research Scholars Thesis by KATHERINE CHRISTINE STUCKMAN Submitted to Honors and Undergraduate Research Texas A&M University in partial

More information

Model-Driven Geo-Elasticity In Database Clouds

Model-Driven Geo-Elasticity In Database Clouds Model-Driven Geo-Elasticity In Database Clouds Tian Guo, Prashant Shenoy College of Information and Computer Sciences University of Massachusetts, Amherst This work is supported by NSF grant 1345300, 1229059

More information

TEMPERATURE MANAGEMENT IN DATA CENTERS: WHY SOME (MIGHT) LIKE IT HOT

TEMPERATURE MANAGEMENT IN DATA CENTERS: WHY SOME (MIGHT) LIKE IT HOT TEMPERATURE MANAGEMENT IN DATA CENTERS: WHY SOME (MIGHT) LIKE IT HOT Nosayba El-Sayed, Ioan Stefanovici, George Amvrosiadis, Andy A. Hwang, Bianca Schroeder {nosayba, ioan, gamvrosi, hwang, bianca}@cs.toronto.edu

More information

Recurrent Neural Network (RNN) Industrial AI Lab.

Recurrent Neural Network (RNN) Industrial AI Lab. Recurrent Neural Network (RNN) Industrial AI Lab. For example (Deterministic) Time Series Data Closed- form Linear difference equation (LDE) and initial condition High order LDEs 2 (Stochastic) Time Series

More information

Demands on task recommendation in crowdsourcing platforms the worker s perspective

Demands on task recommendation in crowdsourcing platforms the worker s perspective Demands on task recommendation in crowdsourcing platforms the worker s perspective Survey Design Documentation for RecSys 15 CrowdRec Submission 1. Overall Survey Design The survey shown on the following

More information

Passive NFS Tracing of and Research Workloads. Daniel Ellard, Jonathan Ledlie, Pia Malkani, Margo Seltzer FAST April 1, 2003

Passive NFS Tracing of  and Research Workloads. Daniel Ellard, Jonathan Ledlie, Pia Malkani, Margo Seltzer FAST April 1, 2003 Passive NFS Tracing of Email and Research Workloads Daniel Ellard, Jonathan Ledlie, Pia Malkani, Margo Seltzer FAST 2003 - April 1, 2003 Talk Outline Motivation Tracing Methodology Trace Summary New Findings

More information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

More information

Coriolis: Scalable VM Clustering in Clouds

Coriolis: Scalable VM Clustering in Clouds 1 / 21 Coriolis: Scalable VM Clustering in Clouds Daniel Campello 1 Carlos Crespo 1 Akshat Verma 2 RajuRangaswami 1 Praveen Jayachandran 2 1 School of Computing and Information Sciences

More information

High Performance Computing Course Notes HPC Fundamentals

High Performance Computing Course Notes HPC Fundamentals High Performance Computing Course Notes 2008-2009 2009 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs

More information

IBM InfoSphere Data Replication s Change Data Capture (CDC) for DB2 LUW databases (Version ) Performance Evaluation and Analysis

IBM InfoSphere Data Replication s Change Data Capture (CDC) for DB2 LUW databases (Version ) Performance Evaluation and Analysis Page 1 IBM InfoSphere Data Replication s Change Data Capture (CDC) for DB2 LUW databases (Version 10.2.1) Performance Evaluation and Analysis 2014 Prasa Urithirakodeeswaran Page 2 Contents Introduction...

More information

Active Clustering and Ranking

Active Clustering and Ranking Active Clustering and Ranking Rob Nowak, University of Wisconsin-Madison IMA Workshop on "High-Dimensional Phenomena" (9/26-30, 2011) Gautam Dasarathy Brian Eriksson (Madison/Boston) Kevin Jamieson Aarti

More information