Characterizing Result Errors in Internet Desktop Grids
|
|
- Theresa Craig
- 6 years ago
- Views:
Transcription
1 Characterizing Result Errors in Internet Desktop Grids D. Kondo 1, F. Araujo 2, P. Malecot 1, P. Domingues 2, L. Silva 2, G. Fedak 1, F. Cappello 1 1 INRIA, France 2 University of Coimbra, Portugal
2 Desktop Grids Astronomy Math Biology LIP IBM AFM Total: ~50 applications using ~1.1 PetaFLOPS from ~1 million active resources
3 Background In large-scale desktop grids involving volunteered, anonymous (and thereby potentially untrusted, insecure) resources, errors are inevitable Software/Hardware Stack Potential Source of Error Application Middleware OS Hardware (Disk, CPU, Memory, Network)
4 Background In large-scale desktop grids involving volunteered, anonymous (and thereby potentially untrusted, insecure) resources, errors are inevitable Software/Hardware Stack Application Potential Source of Error Modify application results Middleware OS Hardware (Disk, CPU, Memory, Network)
5 Background In large-scale desktop grids involving volunteered, anonymous (and thereby potentially untrusted, insecure) resources, errors are inevitable Software/Hardware Stack Application Middleware Potential Source of Error Modify application results Revise and recompile middleware OS Hardware (Disk, CPU, Memory, Network)
6 Background In large-scale desktop grids involving volunteered, anonymous (and thereby potentially untrusted, insecure) resources, errors are inevitable Software/Hardware Stack Application Middleware OS Potential Source of Error Modify application results Revise and recompile middleware Viruses Hardware (Disk, CPU, Memory, Network)
7 Background In large-scale desktop grids involving volunteered, anonymous (and thereby potentially untrusted, insecure) resources, errors are inevitable Software/Hardware Stack Application Middleware OS Hardware (Disk, CPU, Memory, Network) Potential Source of Error Modify application results Revise and recompile middleware Viruses Disk crash, overclocking and overheating of CPU
8 Motivation Number of application-level mechanisms for tolerating errors exist [Sarmenta, Lo] Effectiveness of mechanisms depend on when errors in real systems Yet, characterization of errors is poorly understood
9 Goal Characterize error rates in a real system Frequency Stationarity Correlation Evaluate error tolerance mechanisms in light of this characterization
10 Outline Background Terminology Related Work Method Error Characterization Summary and Future Work
11 Background Terminology server workers
12 Background Terminology server workers workunit download
13 Background Terminology server workers
14 Background Terminology server (correct or erroneous) result upload workers
15 Related Work Error Tolerance Mechanisms [Sarmenta01, Zhao01, Taufer05] Majority voting Spot-checking with blacklisting Credibility-based methods
16 Majority Voting [Sarmenta01] Send 2m-1 instances of the same workunit to multiple workers, and the compare the results Majority vote is complete after receiving m identical results
17 Majority Voting [Sarmenta01] ε ϕ m Fraction of results that will be erroneous Probability that a worker (from the set of erroneous and nonerroneous hosts) returns an erroneous result Number of identical results before a vote is considered to be complete ε majv (ϕ, m) = 2m 1 j=m ( 2m 1 j ) ϕ j (1 ϕ) 2m 1 j
18 Issues Model assumes error rates are not correlated among hosts If error rate is high (>1%), much redundancy required to achieve low error bounds
19 Spot-Checking [Sarmenta01] Distribute workunit with known correct result randomly to workers Compare workers result to known correct result If there is a difference, blacklist that worker
20 Spot-Checking [Sarmenta01] ε q n f s Fraction of results that will be erroneous Frequency of spot-checking Number of workunits to be computed by each worker Fraction of hosts that commit at least one error Error rate per erroneous host ε scbl (q, n, f, s) = sf(1 qs) n (1 f) + f(1 qs) n
21 Issues Assumes blacklisting is efficient and effective Assumes consistency of error rates over time If error rates are low, then the number (n) of workunits to be computed per worker must very high
22 Credibility-Based System [Sarmenta01] Define credibility of an entity as the conditional probability of its correctness given its history of past (spot-)checks Workers build (or lose) credibility as they pass or fail (spot-)checks Compute credibility of result based on worker credibility Issue: assumes the error rate per host is consistent over time
23 Methodology XtremLab: BOINC-based project for characterizing Internet desktop grids Application continuously computes floatingpoint and integer operations Validator conducts syntactical and semantic checks of results Gathered data from about 600 hosts between April - July, 2006
24 Observations and Assumptions Most errors manifest themselves as scrambled or truncated output Likely due to I/O errors Detected errors would have caused a result error in a real application E.g. I/O error corresponds to a corrupt write of checkpoint file
25 Error Rates in Entire Platform "!)- ><05A(!)!!##" >0'A(!)!-,"$!)"!! " # $ % &./012345(46(74/89532:(732;(<//4/: '("!!$
26 Error Rates in Entire Platform "!)- ><05A(!)!!##" >0'A(!)!-,"$ Errors are widespread: ~35% of hosts are erroneous!)"!! " # $ % &./012345(46(74/89532:(732;(<//4/: '("!!$
27 Implications Working example 10 batches, 100 workunits each!overall! 0.01 need!result! 1"10-5 To get!result! 1"10-5 Majority vote: need majority vote (m) of 2 Spot-checking: number of workunits (n) per worker > 5300 * Blacklisting all erroneous hosts is most likely not efficient * q=0.10, f=0.35, s=0.003
28 Cumulative Error Rates and Effect on Throughput Cumulative fraction of errors error throughput Cumulative Fraction of Valid Throughput Fraction of sorted erroneous hosts
29 Cumulative Error Rates and Effect on Throughput Cumulative fraction of errors Error rates skewed. Top 10% produce 70% of errors error throughput Cumulative Fraction of Valid Throughput Fraction of sorted erroneous hosts
30 Cumulative Error Rates and Effect on Throughput Cumulative fraction of errors Error rates skewed. Top 10% produce 70% of errors Blacklisting all hosts not efficient. Would reduce throughput by 40% error throughput Cumulative Fraction of Valid Throughput Fraction of sorted erroneous hosts
31 Cumulative Error Rates and Effect on Throughput Cumulative fraction of errors Error rates skewed. Top 10% produce 70% of errors Blacklisting all hosts not efficient. Would reduce throughput by 40% error throughput Cumulative Fraction of Valid Throughput Fraction of sorted erroneous hosts
32 Spot-Checking with Blacklisting Revisited #!!$ #!!' <--/-.-=5, #!!& #!!" #!!% "!! #!!! #"!! $!!! ()*+,-./0.1/-2)3456.7,-.1/-2,-.-,8)4-,9.:3;
33 Spot-Checking with Blacklisting Revisited #!!$ #!!' Spot-checking acts as low-pass filter, reducing error rates to 2 x 10-4 <--/-.-=5, #!!& #!!" #!!% "!! #!!! #"!! $!!! ()*+,-./0.1/-2)3456.7,-.1/-2,-.-,8)4-,9.:3;
34 Majority Voting Revisited &!! &!!' <--/-.-64, &!!&! &!!&' &!!"! &!!"'! " # $ % &! ()*+,-./0.12, ,8)748.-,9)1-,2.:*;
35 Majority Voting Revisited &!! &!!' Error rate decreases exponentially, quickly below 1x10-5 <--/-.-64, &!!&! &!!&' &!!"! &!!"'! " # $ % &! ()*+,-./0.12, ,8)748.-,9)1-,2.:*;
36 Implications To get!result down to 2 x10-4 Spot-checking is a possibility: most benefit when n is [0,1000] To get!result! 2 x10-4 Use majority voting as!result exponentially decreases with m
37 Error Rate Stationarity A process is stationary if its statistical properties do not change over time Determine how stationary mean of host error rate (s) is over time Determine change in mean error rates over 96-hour periods for each host
38 Statistics for Host Error Rates over 96-hour periods Statistic Host Group µ σ σ/µ All erroneous Top 10% erroneous Bottom 90% erroneous Only about 10% of the error rates were within 25% of the mean
39 Implications Spot-checking and credibility-based systems may have limited effectiveness Both depend on the consistency of error rates over time Host with low error rate could build high credibility, and then triple its error rates
40 Correlation of Error Rates Determine independence of error on one host with that on another Independence: P(A and B) = P(A)*P(B) Determine empirical joint probability that any two hosts have error simultaneously Computed theoretical probability of two hosts from error simultaneously If error rates are not positively correlated P(A)*P(B) - P(A and B) # 0 theoretical - empirical # 0 } } P(A and B) P(A)*P(B)
41 Pairwise Host Error Rates # $'-?@;@872/A4&05762/19 $', $'+ $'* $') $'( $'! B5762/19&C&$D&$'$#((! B5762/19&E&$D&$'-,))+ $'" $'# $!!!"!# $ # "!./00&10& /678&79:&4;</5/678&<7/5=/>4&45515&5724> %&#$!!
42 Pairwise Host Error Rates # $'-?@;@872/A4&05762/19 $', $'+ $'* $') $'( $'! $'" $'# B5762/19&C&$D&$'$#((! B5762/19&E&$D&$'-,))+ Most host errors not positively correlated. Implication: majority voting likely effective in real systems $!!!"!# $ # "!./00&10& /678&79:&4;</5/678&<7/5=/>4&45515&5724> %&#$!!
43 Summary of Characterization Results A significant fraction of hosts (about 35%) will commit at least a single error over time The mean error rate over all hosts (0.0022) is quite low A large fraction of errors (0.70) result from a small fraction of hosts (0.10) Error rates over time vary greatly (as much 3.48 times) Error rates between two hosts often seem uncorrelated (more than of hosts do not have positively correlated errors)
44 Summary of Implications If one can afford redundancy or one needs an error rate to be less then 2 " 10-4, then majority voting should be considered If one can afford an error rate greater then 2 x 10-4 and can make batches relatively long, spotchecking with blacklisting should be considered Fluctuations in error rates over time may limit the effectiveness of spot-checking and crediblility-based systems
45 Future Work Use of synthetic application Important to have application regularity (I/O, computation) Not that different from real desktop grid applications (cannot be obtrusive) Compute-intensive, small-memory footprint, light periodic I/O for application-level checkpoints Characterize and run real desktop grid applications Profile applications Execute workunits representative from each profile
46 Thank you
Characterizing Result Errors in Internet Desktop Grids
Characterizing Result Errors in Internet Desktop Grids Derrick Kondo 1, Filipe Araujo 2, Paul Malecot 1, Patricio Domingues 3, Luis Moura Silva 2, Gilles Fedak 1, and Franck Cappello 1 1 INRIA Futurs,
More informationModeling and Tolerating Heterogeneous Failures in Large Parallel Systems
Modeling and Tolerating Heterogeneous Failures in Large Parallel Systems Eric Heien 1, Derrick Kondo 1, Ana Gainaru 2, Dan LaPine 2, Bill Kramer 2, Franck Cappello 1, 2 1 INRIA, France 2 UIUC, USA Context
More informationVirtualization for Desktop Grid Clients
Virtualization for Desktop Grid Clients Marosi Attila Csaba atisu@sztaki.hu BOINC Workshop 09, Barcelona, Spain, 23/10/2009 Using Virtual Machines in Desktop Grid Clients for Application Sandboxing! Joint
More informationVIAF: Verification-based Integrity Assurance Framework for MapReduce. YongzhiWang, JinpengWei
VIAF: Verification-based Integrity Assurance Framework for MapReduce YongzhiWang, JinpengWei MapReduce in Brief Satisfying the demand for large scale data processing It is a parallel programming model
More informationBANDWIDTH MODELING IN LARGE DISTRIBUTED SYSTEMS FOR BIG DATA APPLICATIONS
BANDWIDTH MODELING IN LARGE DISTRIBUTED SYSTEMS FOR BIG DATA APPLICATIONS Bahman Javadi School of Computing, Engineering and Mathematics Western Sydney University, Australia 1 Boyu Zhang and Michela Taufer
More informationOn the Scheduling of Checkpoints in Desktop Grids
On the Scheduling of Checkpoints in Desktop Grids Mohamed Slim Bouguerra, Derrick Kondo INRIA Rhone-Alpes Grenoble ZIRST, 51, avenue Jean Kuntzmann 3833 MONBONNOT SAINT MARTIN, France mohamed-slim.bouguerra@imag.fr;
More informationCycle Sharing Systems
Cycle Sharing Systems Jagadeesh Dyaberi Dependable Computing Systems Lab Purdue University 10/31/2005 1 Introduction Design of Program Security Communication Architecture Implementation Conclusion Outline
More informationBOINC. BOINC: A System for Public-Resource Computing and Storage David P. Anderson. Serge Koren CMSC714 November 22, 2005
BOINC BOINC: A System for Public-Resource Computing and Storage David P. Anderson Serge Koren CMSC714 November 22, 2005 Outline Introduction Contrast to Grid Computing BOINC Goals BOINC Project/Server
More informationOn Resource Volatility in Enterprise Desktop Grids
On Resource Volatility in Enterprise Desktop Grids Derrick Kondo Gilles Fedak Franck Cappello Andrew A. Chien 2 Henri Casanova 3 Laboratoire de Recherche en Informatique/INRIA Futurs 2 Intel Research 3
More informationThe final publication is available at Springer via
c IFIP International Federation for Information Processing 2017. This is the author s version of the work. It is posted here by permission of IFIP for your personal use. Not for redistribution. The final
More informationImproving Altibase Performance with Solarflare 10GbE Server Adapters and OpenOnload
Improving Altibase Performance with Solarflare 10GbE Server Adapters and OpenOnload Summary As today s corporations process more and more data, the business ramifications of faster and more resilient database
More informationECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability. Copyright 2010 Daniel J. Sorin Duke University
Advanced Computer Architecture II (Parallel Computer Architecture) Availability Copyright 2010 Daniel J. Sorin Duke University Definition and Motivation Outline General Principles of Available System Design
More informationSecurity issues in hierarchically connected BOINC systems
Security issues in hierarchically connected BOINC systems Gábor Gombás MTA SZTAKI Introduction BOINC mainly focuses on big, stand-alone, public projects At SZTAKI we're looking into
More informationDiscovering Statistical Models of Availability in Large Distributed Systems: An Empirical Study of
1 Discovering Statistical Models of Availability in Large Distributed Systems: An Empirical Study of SETI@home Bahman Javadi, Member, IEEE, Derrick Kondo, Member, IEEE, Jean-Marc Vincent, Member, IEEE
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2008 Quiz II
Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.830 Database Systems: Fall 2008 Quiz II There are 14 questions and 11 pages in this quiz booklet. To receive
More informationCondor and BOINC. Distributed and Volunteer Computing. Presented by Adam Bazinet
Condor and BOINC Distributed and Volunteer Computing Presented by Adam Bazinet Condor Developed at the University of Wisconsin-Madison Condor is aimed at High Throughput Computing (HTC) on collections
More informationChapter 8 Fault Tolerance
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance Fault Tolerance Basic Concepts Being fault tolerant is strongly related to what
More informationDynamically Estimating Reliability in a Volunteer-Based Compute and Data-Storage System
Dynamically Estimating Reliability in a Volunteer-Based Compute and Data-Storage System Muhammed Uluyol University of Minnesota Abstract Although cloud computing is a powerful tool for analyzing large
More informationHow to speed up a database which has gotten slow
Triad Area, NC USA E-mail: info@geniusone.com Web: http://geniusone.com How to speed up a database which has gotten slow hardware OS database parameters Blob fields Indices table design / table contents
More informationChapter 6 Random Number Generation
Chapter 6 Random Number Generation Requirements / application Pseudo-random bit generator Hardware and software solutions [NetSec/SysSec], WS 2007/2008 6.1 Requirements and Application Scenarios Security
More informationAre Disks the Dominant Contributor for Storage Failures?
Are Disks the Dominant Contributor for Storage Failures? A Comprehensive Study of Storage Subsystem Failure Characteristics Weihang Jiang, Chongfeng Hu, Yuanyuan Zhou, and Arkady Kanevsky Department of
More informationCSCI 204 Introduction to Computer Science II Lab 7 Queue ADT
CSCI 204 Introduction to Computer Science II Lab 7 Queue ADT 1. Objectives In this lab, you will practice the following: Implement the Queue ADT using a structure of your choice, e.g., array or linked
More informationTowards Scalable Data Management for Map-Reduce-based Data-Intensive Applications on Cloud and Hybrid Infrastructures
Towards Scalable Data Management for Map-Reduce-based Data-Intensive Applications on Cloud and Hybrid Infrastructures Frédéric Suter Joint work with Gabriel Antoniu, Julien Bigot, Cristophe Blanchet, Luc
More information5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks 485.e1
5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks 485.e1 5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks Amdahl s law in Chapter 1 reminds us that
More informationEvaluating the Impact of Client based CPU Scheduling Policies on the Application s Performance in Desktop Grid Systems
144 Evaluating the Impact of Client based CPU Scheduling Policies on the Application s Performance in Desktop Grid Systems Muhammad Khalid Khan and Danish Faiz College of Computing & Information Sciences,
More informationA Taxonomy of Desktop Grids and its Mapping to State-of-the-Art Systems
A Taxonomy of Desktop Grids and its Mapping to State-of-the-Art Systems SUNGJIN CHOI, RAJKUMAR BUYYA University of Melbourne, Australia and HONGSOO KIM, EUNJOUNG BYUN Korea University, Korea and MAENGSOON
More informationPreview. The Thread Model Motivation of Threads Benefits of Threads Implementation of Thread
Preview The Thread Model Motivation of Threads Benefits of Threads Implementation of Thread Implement thread in User s Mode Implement thread in Kernel s Mode CS 431 Operating System 1 The Thread Model
More informationSOFT 437. Software Performance Analysis. Ch 7&8:Software Measurement and Instrumentation
SOFT 437 Software Performance Analysis Ch 7&8: Why do we need data? Data is required to calculate: Software execution model System execution model We assumed that we have required data to calculate these
More informationE-SCIENCE WORKFLOW ON THE GRID
E-SCIENCE WORKFLOW ON THE GRID Yaohang Li Department of Computer Science North Carolina A&T State University, Greensboro, NC 27411, USA yaohang@ncat.edu Michael Mascagni Department of Computer Science
More informationThe Fusion Distributed File System
Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique
More informationAnalytic Performance Models for Bounded Queueing Systems
Analytic Performance Models for Bounded Queueing Systems Praveen Krishnamurthy Roger D. Chamberlain Praveen Krishnamurthy and Roger D. Chamberlain, Analytic Performance Models for Bounded Queueing Systems,
More informationBECOME A LOAD TESTING ROCK STAR
3 EASY STEPS TO BECOME A LOAD TESTING ROCK STAR Replicate real life conditions to improve application quality Telerik An Introduction Software load testing is generally understood to consist of exercising
More informationAn Adaptive Priority Tuning System for Optimized Local CPU Scheduling using BOINC Clients
Journal of Physics: Conference Series An Adaptive Priority Tuning System for Optimized Local CPU Scheduling using BOINC Clients To cite this article: Adel B Mnaouer and Colin Ragoonath 2010 J. Phys.: Conf.
More informationThe Lattice BOINC Project Public Computing for the Tree of Life
The Lattice BOINC Project Public Computing for the Tree of Life Presented by Adam Bazinet Center for Bioinformatics and Computational Biology Institute for Advanced Computer Studies University of Maryland
More informationTowards Ensuring Collective Availability in Volatile Resource Pools via Forecasting
Towards CloudComputing@home: Ensuring Collective Availability in Volatile Resource Pools via Forecasting Artur Andrzejak Berlin (ZIB) andrzejak[at]zib.de Zuse-Institute Derrick Kondo David P. Anderson
More informationQoS Management of Web Services
QoS Management of Web Services Zibin Zheng (Ben) Supervisor: Prof. Michael R. Lyu Department of Computer Science & Engineering The Chinese University of Hong Kong Dec. 10, 2010 Outline Introduction Web
More informationOptimizing Peer Relationships in a Super-Peer Network
Optimizing Peer Relationships in a Super-Peer Network Pawe l Garbacki and Dick H.J. Epema Delft University of Technology {p.j.garbacki,d.h.j.epema}@tudelft.nl Maarten van Steen Vrije Universiteit Amsterdam
More informationModeling and Synthesizing Task Placement Constraints in Google Compute Clusters
Modeling and Synthesizing Task Placement s in Google s Bikash Sharma Pennsylvania State University University Park 1 bikash@cse.psu.edu Rasekh Rifaat Google Inc. Seattle 913 rasekh@google.com Victor Chudnovsky
More informationConstructing a P2P-Based High Performance Computing Platform*
Constructing a P2P-Based High Performance Computing Platform* Hai Jin, Fei Luo, Xiaofei Liao, Qin Zhang, and Hao Zhang Cluster and Grid Computing Laboratory, Huazhong University of Science and Technology,
More informationScheduling a Large DataCenter
Scheduling a Large DataCenter Cliff Stein Columbia University Google Research Monika Henzinger, Ana Radovanovic Google Research, U. Vienna Scheduling a DataCenter Companies run large datacenters Construction,
More informationTowards a Security Model to Bridge Internet Desktop Grids and Service Grids
Towards a Security Model to Bridge Internet Desktop Grids and Service Grids Gabriel Caillat(1), Oleg Lodygensky(1), Etienne Urbah(1), Gilles Fedak(2), and Haiwu He(2) (1) Laboratoire de lʼaccelerateur
More informationChristian Benjamin Ries 1 and Christian Schröder 1. Wilhelm-Bertelsmann-Straße 10, Bielefeld, Germany. 1. Introduction
Excerpt from the Proceedings of the COMSOL Conference 2010 Paris ComsolGrid A framework for performing large-scale parameter studies using COMSOL Multiphysics and the Berkeley Open Infrastructure for Network
More informationTwo-Level Dynamic Load Balancing Algorithm Using Load Thresholds and Pairwise Immigration
Two-Level Dynamic Load Balancing Algorithm Using Load Thresholds and Pairwise Immigration Hojiev Sardor Qurbonboyevich Department of IT Convergence Engineering Kumoh National Institute of Technology, Daehak-ro
More informationIT 540 Operating Systems ECE519 Advanced Operating Systems
IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (3 rd Week) (Advanced) Operating Systems 3. Process Description and Control 3. Outline What Is a Process? Process
More informationModeling Time-variant User Mobility in Wireless Mobile Networks (Time-variant Community (TVC) Model)
Modeling Time-variant User Mobility in Wireless Mobile Networks (Time-variant Community (TVC) Model) Wei-jen Hsu, Dept. of CISE, U. of Florida Thrasyvoulos Spyropoulos, INRIA, Sophia-Antipolis, France
More informationModeling and Synthesizing Task Placement Constraints in Google Compute Clusters
Modeling and Synthesizing Task Placement s in Google s Bikash Sharma Pennsylvania State University University Park 1 bikash@cse.psu.edu Rasekh Rifaat Google Inc. Seattle 93 rasekh@google.com Victor Chudnovsky
More informationCustom execution environments in the BOINC middleware
Custom execution environments in the BOINC middleware Diogo Ferreira 1, Filipe Araujo 1, Patricio Domingues 3 1 CISUC, Dept. of Informatics Engineering, University of Coimbra, Portugal defer@student.dei.uc.pt
More informationMonte Carlo for Spatial Models
Monte Carlo for Spatial Models Murali Haran Department of Statistics Penn State University Penn State Computational Science Lectures April 2007 Spatial Models Lots of scientific questions involve analyzing
More informationMachine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham
Final Report for cs229: Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham Abstract. The goal of this work is to use machine learning to understand
More informationNetwork Security. Random Number Generation. Chapter 6. Network Security (WS 2003): 06 Random Number Generation 1 Dr.-Ing G.
Network Security Chapter 6 Random Number Generation Network Security (WS 2003): 06 Random Number Generation 1 Tasks of Key Management (1) Generation: It is crucial to security, that keys are generated
More informationFault Tolerance. Distributed Systems. September 2002
Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend
More informationChapter 20: Database System Architectures
Chapter 20: Database System Architectures Chapter 20: Database System Architectures Centralized and Client-Server Systems Server System Architectures Parallel Systems Distributed Systems Network Types
More informationCS Project Report
CS7960 - Project Report Kshitij Sudan kshitij@cs.utah.edu 1 Introduction With the growth in services provided over the Internet, the amount of data processing required has grown tremendously. To satisfy
More informationImplementation Issues. Remote-Write Protocols
Implementation Issues Two techniques to implement consistency models Primary-based protocols Assume a primary replica for each data item Primary responsible for coordinating all writes Replicated write
More informationBasic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit.
Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication One-one communication One-many communication Distributed commit Two phase commit Failure recovery
More informationChapter 8 Fault Tolerance
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to
More informationTYPES OF OPERATING SYSTEMS. Dimple Juneja
TYPES OF OPERATING SYSTEMS Outline Operating Systems Multiprogramming Systems Time-sharing Systems Multitasking Systems Operating System Architectures OS: A Usual View What is an operating system? An interface
More informationOn Characteristics and Modeling of P2P Resources with Correlated Static and Dynamic Attributes
Proc. IEEE GLOBECOM, Houston, TX, Dec. 2011 On Characteristics and Modeling of P2P Resources with Correlated Static and Dynamic Attributes H. M. N. Dilum Bandara and Anura P. Jayasumana Department of Electrical
More informationBuffered Co-scheduling: A New Methodology for Multitasking Parallel Jobs on Distributed Systems
National Alamos Los Laboratory Buffered Co-scheduling: A New Methodology for Multitasking Parallel Jobs on Distributed Systems Fabrizio Petrini and Wu-chun Feng {fabrizio,feng}@lanl.gov Los Alamos National
More informationToday: Fault Tolerance
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing
More informationBig picture. Definitions. Internal sorting. Exchange sorts. Insertion sort Bubble sort Selection sort Comparison. Comp Sci 1575 Data Structures
Internal sorting Comp Sci 1575 Data Structures Admin notes Advising appointments will eclipse office hours this week, so no guarantees about availability during normal times. With 130 appointments at 15
More informationIndex. ADEPT (tool for modelling proposed systerns),
Index A, see Arrivals Abstraction in modelling, 20-22, 217 Accumulated time in system ( w), 42 Accuracy of models, 14, 16, see also Separable models, robustness Active customer (memory constrained system),
More informationThe Dangers and Complexities of SQLite Benchmarking. Dhathri Purohith, Jayashree Mohan and Vijay Chidambaram
The Dangers and Complexities of SQLite Benchmarking Dhathri Purohith, Jayashree Mohan and Vijay Chidambaram 2 3 Benchmarking SQLite is Non-trivial! Benchmarking complex systems in a repeatable fashion
More informationSELF-HEALING NETWORKS: REDUNDANCY AND STRUCTURE
SELF-HEALING NETWORKS: REDUNDANCY AND STRUCTURE Guido Caldarelli IMT, CNR-ISC and LIMS, London UK DTRA Grant HDTRA1-11-1-0048 INTRODUCTION The robustness and the shape Baran, P. On distributed Communications
More informationCS 578 Software Architectures Fall 2014 Homework Assignment #1 Due: Wednesday, September 24, 2014 see course website for submission details
CS 578 Software Architectures Fall 2014 Homework Assignment #1 Due: Wednesday, September 24, 2014 see course website for submission details The Berkeley Open Infrastructure for Network Computing (BOINC)
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationToday: Fault Tolerance. Replica Management
Today: Fault Tolerance Failure models Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery
More informationAnalysis of Program Behavior
Analysis of Program Behavior High Performance Computing, Visualization Lucas Mello Schnorr probably soon (LIG-CNRS INF-UFRGS) 2 nd LICIA Workshop Grenoble, France September 5th, 2012 1/ 25 Introduction
More informationEstimation of MPI Application Performance on Volunteer Environments
Estimation of MPI Application Performance on Volunteer Environments Girish Nandagudi 1, Jaspal Subhlok 1, Edgar Gabriel 1, and Judit Gimenez 2 1 Department of Computer Science, University of Houston, {jaspal,
More informationSoftware Error Correction Support Policy
Software Error Correction Support Policy Oracle Enterprise Performance Management Version 1.0 Revised: January 9, 2015 Applies to: Oracle Enterprise Performance Management (Includes Hyperion) Table of
More informationA Crash Course In Wide Area Data Replication. Jacob Farmer, CTO, Cambridge Computer
A Crash Course In Wide Area Data Replication Jacob Farmer, CTO, Cambridge Computer SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals
More informationibench: Quantifying Interference in Datacenter Applications
ibench: Quantifying Interference in Datacenter Applications Christina Delimitrou and Christos Kozyrakis Stanford University IISWC September 23 th 2013 Executive Summary Problem: Increasing utilization
More informationDirect Methods in Visual Odometry
Direct Methods in Visual Odometry July 24, 2017 Direct Methods in Visual Odometry July 24, 2017 1 / 47 Motivation for using Visual Odometry Wheel odometry is affected by wheel slip More accurate compared
More informationCloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018
Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning
More informationADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT
ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision
More informationChapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!
Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and
More informationCharacterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory
Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory Yixin Luo, Sriram Govindan, Bikash Sharma, Mark Santaniello, Justin Meza, Aman Kansal,
More informationResource Usage of Windows Computer Laboratories
Resource Usage of Windows Computer Laboratories Patricio Domingues Paulo Marques Luis Silva ESTG Leiria Portugal Univ. Coimbra Portugal Univ. Coimbra Portugal patricio@estg.ipleiria.pt pmarques@dei.uc.pt
More informationPimp My Data Grid. Brian Oliver Senior Principal Solutions Architect <Insert Picture Here>
Pimp My Data Grid Brian Oliver Senior Principal Solutions Architect (brian.oliver@oracle.com) Oracle Coherence Oracle Fusion Middleware Agenda An Architectural Challenge Enter the
More informationDistributed Systems Fault Tolerance
Distributed Systems Fault Tolerance [] Fault Tolerance. Basic concepts - terminology. Process resilience groups and failure masking 3. Reliable communication reliable client-server communication reliable
More informationZooKeeper Atomic Broadcast
ZooKeeper Atomic Broadcast The heart of the ZooKeeper coordination service Benjamin Reed, Flavio Junqueira Yahoo! Research ZooKeeper Service Transforms a request into an idempotent transaction Request
More informationBig and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant
Big and Fast Anti-Caching in OLTP Systems Justin DeBrabant Online Transaction Processing transaction-oriented small footprint write-intensive 2 A bit of history 3 OLTP Through the Years relational model
More informationMySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona
MySQL Performance Optimization and Troubleshooting with PMM Peter Zaitsev, CEO, Percona In the Presentation Practical approach to deal with some of the common MySQL Issues 2 Assumptions You re looking
More informationDirect Anonymous Attestation
Direct Anonymous Attestation Revisited Jan Camenisch IBM Research Zurich Joint work with Ernie Brickell, Liqun Chen, Manu Drivers, Anja Lehmann. jca@zurich.ibm.com, @JanCamenisch, ibm.biz/jancamenisch
More informationQoS-aware resource allocation and load-balancing in enterprise Grids using online simulation
QoS-aware resource allocation and load-balancing in enterprise Grids using online simulation * Universität Karlsruhe (TH) Technical University of Catalonia (UPC) Barcelona Supercomputing Center (BSC) Samuel
More informationUnderstanding Availability
Understanding Availability Ranjita Bhagwan, Stefan Savage and Geoffrey M. Voelker Department of Computer Science and Engineering University of California, San Diego Abstract This paper addresses a simple,
More informationCDA 5140 Software Fault-tolerance. - however, reliability of the overall system is actually a product of the hardware, software, and human reliability
CDA 5140 Software Fault-tolerance - so far have looked at reliability as hardware reliability - however, reliability of the overall system is actually a product of the hardware, software, and human reliability
More informationDynamics 365. for Finance and Operations, Enterprise edition (onpremises) system requirements
Dynamics 365 ignite for Finance and Operations, Enterprise edition (onpremises) system requirements This document describes the various system requirements for Microsoft Dynamics 365 for Finance and Operations,
More informationResource Estimation for Objectory Projects
Resource Estimation for Objectory Projects Gustav Karner Objective Systems SF AB Torshamnsgatan 39, Box 1128 164 22 Kista email: gustav@os.se September 17, 1993 Abstract In order to estimate the resources
More informationITERATIVE COLLISION RESOLUTION IN WIRELESS NETWORKS
ITERATIVE COLLISION RESOLUTION IN WIRELESS NETWORKS An Undergraduate Research Scholars Thesis by KATHERINE CHRISTINE STUCKMAN Submitted to Honors and Undergraduate Research Texas A&M University in partial
More informationModel-Driven Geo-Elasticity In Database Clouds
Model-Driven Geo-Elasticity In Database Clouds Tian Guo, Prashant Shenoy College of Information and Computer Sciences University of Massachusetts, Amherst This work is supported by NSF grant 1345300, 1229059
More informationTEMPERATURE MANAGEMENT IN DATA CENTERS: WHY SOME (MIGHT) LIKE IT HOT
TEMPERATURE MANAGEMENT IN DATA CENTERS: WHY SOME (MIGHT) LIKE IT HOT Nosayba El-Sayed, Ioan Stefanovici, George Amvrosiadis, Andy A. Hwang, Bianca Schroeder {nosayba, ioan, gamvrosi, hwang, bianca}@cs.toronto.edu
More informationRecurrent Neural Network (RNN) Industrial AI Lab.
Recurrent Neural Network (RNN) Industrial AI Lab. For example (Deterministic) Time Series Data Closed- form Linear difference equation (LDE) and initial condition High order LDEs 2 (Stochastic) Time Series
More informationDemands on task recommendation in crowdsourcing platforms the worker s perspective
Demands on task recommendation in crowdsourcing platforms the worker s perspective Survey Design Documentation for RecSys 15 CrowdRec Submission 1. Overall Survey Design The survey shown on the following
More informationPassive NFS Tracing of and Research Workloads. Daniel Ellard, Jonathan Ledlie, Pia Malkani, Margo Seltzer FAST April 1, 2003
Passive NFS Tracing of Email and Research Workloads Daniel Ellard, Jonathan Ledlie, Pia Malkani, Margo Seltzer FAST 2003 - April 1, 2003 Talk Outline Motivation Tracing Methodology Trace Summary New Findings
More informationNgram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More informationCoriolis: Scalable VM Clustering in Clouds
1 / 21 Coriolis: Scalable VM Clustering in Clouds Daniel Campello 1 Carlos Crespo 1 Akshat Verma 2 RajuRangaswami 1 Praveen Jayachandran 2 1 School of Computing and Information Sciences
More informationHigh Performance Computing Course Notes HPC Fundamentals
High Performance Computing Course Notes 2008-2009 2009 HPC Fundamentals Introduction What is High Performance Computing (HPC)? Difficult to define - it s a moving target. Later 1980s, a supercomputer performs
More informationIBM InfoSphere Data Replication s Change Data Capture (CDC) for DB2 LUW databases (Version ) Performance Evaluation and Analysis
Page 1 IBM InfoSphere Data Replication s Change Data Capture (CDC) for DB2 LUW databases (Version 10.2.1) Performance Evaluation and Analysis 2014 Prasa Urithirakodeeswaran Page 2 Contents Introduction...
More informationActive Clustering and Ranking
Active Clustering and Ranking Rob Nowak, University of Wisconsin-Madison IMA Workshop on "High-Dimensional Phenomena" (9/26-30, 2011) Gautam Dasarathy Brian Eriksson (Madison/Boston) Kevin Jamieson Aarti
More information