Clustering and Reclustering HEP Data in Object Databases
|
|
- Susanna Waters
- 5 years ago
- Views:
Transcription
1 Clustering and Reclustering HEP Data in Object Databases Koen Holtman CERN EP division CH - Geneva 3, Switzerland We formulate principles for the clustering of data, applicable to both sequential HEP applications and to farming HEP applications with a high degree of concurrency. We make the case for the reclustering of HEP data on the basis of performance measurements and briefly discuss a prototype automatic reclustering system. Introduction As part of the CMS contribution to the RD4 [] collaboration, database clustering and reclustering have been under investigation for about. years. The clustering of objects in an object database is the mapping of objects to locations on physical storage media like disk farms and tapes. The performance of the database, and the physics application on top of it, depends crucially on having a good match between the object clustering and the database access operations performed by the physics application. The principles for clustering discussed in this paper are based on, and illustrated with, a set of performance measurements. The performance measurements shown in this paper were all performed on a Sun Ultra Enterprise server, running SunOS Release.., on hard disks in a SPARCstorage array (no striping, no other RAID type processing,.-gb 700-rpm fast-wide SCSI-, Seagate ST-30W). These disks which can be considered typical for the high end of the 994 commodity disk market. All performance results were cross-validated on at least one other hardware/os configuration, most results on at least two other configurations. The object database system used was always Objectivity/DB version 4 []. HEP data clustering basics Most I/O intensive physics analysis systems, no matter what the implementation method, and no matter whether tape or disk based, use the following simple principles to optimise performance. Divide the set of all events into fairly large chunks (in most current systems a chunk is a run or a part of a physics stream [3]). Implement farming (both for disks and CPUs) at the chunk level 3. Make sure that (sub)jobs always iterate through the events in a chunk in the same order 4. Cluster the event data in a chunk in the iteration order Though object databases make it perhaps easier than ever to build physics analysis systems which do not follow the principles above, we believe that these principles are currently still the most viable basis for designing a performant production system. Principle above, dividing the event set into chunks, involves coarse grained clustering decisions: strategies like dividing events into physics streams [3] are often used here, and newer strategies are a topic of active research [4]. At the chunk level, reducing tape mounts is a very important goal, and an important constraint is that it is not feasible to recluster data often.
2 Most of this paper deals with the refinement of principle 4 above, that is with clustering and reclustering decisions at the sub-chunk level. At this level, an important goal is to achieve nearsequential reading on the disk or disk farm, and frequent reclustering is feasible as a strategy for optimising performance and reducing disk space occupancy. The clustering techniques discussed in this paper can be used with both the Objectivity/DB [] and Versant [] object databases, though Objectivity offers far more direct and convenient support for them. Note that clustering and reclustering are not problems which are specific to commercial object databases. At the core, the clustering problem is one of reducing disk seeks, tape seeks and tape mounts, and this problem exists equally well in physics analysis systems not based on object databases, even though other systems may use different terminology to describe the problem. 3 Type-based clustering The most obvious way to refine principle Detector X 4 above is to cluster data as shown in Fig.. Detector Y For each event in the chunk, the event data Detector Z is split into several objects of different types. Reconstructed P s For example, one type can hold all data for Event summary tags a single subdetector. Then, these objects are grouped by type into collections. Inside each collection, the objects for the different events are clustered in the iteration order. This way, Figure : Clustering of a chunk into collections a job which only needs one type of data per event automatically performs sequential reading over a single collection, which exactly contains the needed data, yielding the maximum achievable I/O performance. and preferred physical pattern (right) Figure : Reading two collections: logical pattern (left) There are some performance pitfalls however for jobs which need to read two or more 4. collections from the same disk or disk array. A 4 job reading two collections has a logical object reading pattern as shown on the left in 3. Fig.. To achieve near-sequential throughput 3 for such a job, the logical pattern needs to. 800 KB read ahead be transformed into the physical pattern at the 60 KB read ahead right in Fig.. We found that this transformation. no read ahead was not performed by Objectivity/DB, the database on which we implemented our test 0. system, nor by the operating system (we tested 0 both SunOS and HP-UX), nor by the disk Number of collections 0 hardware, for various commodity brands. The result was a significant performance degradation, Figure 3: One client reading multiple collections of 8 KB especially when reading more than two objects from a single disk collections, see the solid line in Fig. 3. We eliminated the performance degradation by extending the collection indexing/iteration Event Event Event 3 Read performance (MB/s) Event N
3 class to read ahead objects into the database cache. This extension could be made without affecting the end-user physics analysis code. Measurements (see Fig. 3) showed that when 800 KB worth of objects were read ahead for each collection, the I/O throughput approached that of sequential reading (3.9 MB/s). Keeping all collections on different disks would of course be an alternative to the read- Disk Disk Disk 3 ahead optimisation. That approach would however create a load balancing problem: for optimal performance one has to make sure that all disks are kept busy, even for jobs which only read one or a few collections. The problem can be solved to some extent by mapping Chunk type X Chunk type Z Chunk 3 type Y Chunk type Y Chunk type X Chunk 3 type Z Chunk type Z Chunk type Y Chunk 3 type X Figure 4: Load-balancing arrangement as an alternative to using a read-ahead optimisation collections in different chunks to disks as shown in Fig. 4. This will produce load balancing for any number of collections, assuming that the subjobs running in parallel on each chunk are about equally heavy. A problem with this solution is that it requires a higher degree of connectedness between all disks and all CPUs. We therefore prefer to use the read-ahead optimisation: by devoting a modest amount or memory (given current RAM prices) to read-ahead buffering, we can keep the objects for one event together on the same disk, which gives us greater fault-tolerance and decreases the dependence on subjob scheduling. 4 Random database access For small objects, a good clustering is more important than for large objects. This is illustrated by Fig., which plots the ratio between the speed of sequential reading and that of random reading for different object sizes. Fig. shows a curve for 994 disks and one for disks in the year 00, based on an analysis of hard disk technology trends [6]. The performance of sequential reading is the performance of the best possible clustering arrangement, that of random reading the performance of the worst possible clustering arrangement. Fig. therefore also plots the worst-case performance loss in the case of bad clustering. We see that currently, for objects Performance ratio disks (est.) 994 disks K K 4K 8K 6K 3K 64K Average object size (bytes) Figure : Performance ratio between sequential and random reading. larger than 64 KB, clustering is not that important: the performance loss for bad clustering is never more than a factor. The 00 curve shows however that the importance of good clustering will increase in future. Selective reading Physics analysis jobs usually don t read all objects in a collection: they only iterate through a subset of of the collection corresponding to those events satisfying a certain cut predicate. We call this iteration through a subset selective reading. Theselectivity is the percentage of objects in the collection which is needed by the job. In tests we found that, as the selectivity increases, 3
4 the throughput of selective reading drops rapidly, only to level out at the throughput of random reading. This is shown, for a collection of 8 KB objects, with an 8 KB database page size, in Fig. 6. In other tests we found that the curve in Fig. 6 does not change much as a function of the page size. The curve in Fig. 6 has two distinct parts. In the part covering selectivity values 6 from 00% to roughly %, the decrease in throughput exactly mirrors the increase in selectivity. If we would have sequentially read sequential:.4 MB/s random: 0.7 MB/s all objects, and then thrown away the unneeded ones, the job would have taken 4 the Bandwidth (MB/s) Selectivity (%) Selectivity (%) Figure 6: Selective reading of 8 KB objects Here, selective reading is as fast as sequential reading K K 4K 8K 6K 3K 64K Average object size (bytes) Here, selective reading is faster than sequential reading same time. Thus, in this part of the curve, selective reading is useless as an optimisation device if the job is disk-bound. However, selective reading will decrease the load on the CPU, the cache, and (if applicable) the network connection to the remote disk. This reduction in load depends largely on the selectivity on database pages, not on objects. See [7] for a discussion of page selectivity. In the part of the curve between % and 0%, selective reading is faster than sequential reading and then throwing data away. On the other hand, it is not faster than random reading. We found that the boundary between the two parts of the curve, which is located at % in Fig. 6, depends on the average object size and the selectivity. This boundary is visualised in Fig. 7. From Fig. 7 we can conclude that for collections of small objects, selective reading may not be worth the extra indexing complexity over sequential reading and throwing the unneeded data away. A corollary of this is that one could pack several small logical objects into larger physical objects with- Figure 7: Selective reading performance boundary out getting a loss of performance even for high selectivities. For collections of large objects, a selective reading mechanism can be useful as a means of ensuring that the performance never drops below that of random reading. 6 Reclustering To avoid the performance degradation associated with an increase in selectivity which we discussed above, reclustering could be used. Reclustering, re-arranging the objects in the database, is an expensive operation, but just letting the performance drop with increasing selectivity can easily be more costly, especially for collections in which objects are small. The simplest form of 4 0 0
5 reclustering is to copy only those objects which are actually wanted in a particular analysis effort to a new collection at the start of the effort. The creation of a data summary tape or an ntuple file are examples of this simple form of reclustering. Much more advanced forms of reclustering are feasible in a system based on an object database. Automatic reclustering, in which the system reacts to changing access patterns without any user hints beforehand, is feasible whenever there are sequences of jobs which access the same event set. We have prototyped an automatic reclustering system ([8], [9]) which æ performs reclustering transparently to the user code, æ can optimise clustering for four different analysis efforts at the same time, and æ keeps the use of storage space within bounds by avoiding the duplication of data We refer the reader to [8] for a discussion of the architecture of our automatic reclustering system. Fig. 8 illustrates its performance under a simple physics analysis scenario in which 40 subsequent jobs are run, with a new cut predicate being added after every 0 jobs. Job size and run time > Job size and run time > Batch reclustering operations Subsequent jobs > Subsequent jobs > Figure 8: Performance of 40 subsequent jobs without (left) and with (right) automatic reclustering. Each pair of bars represents a job: the black bar represents the number of objects accessed, the grey bar is the (wall clock) job run time. Reclustering is an important research topic in the object database community (see [8] for some references). However, this research is directed at typical object database workloads like CAD workloads. Physics analysis workloads are highly atypical: they are mostly read-only, transactions routinely access millions of objects, and most importantly the workloads lend themselves to streaming type optimisations. It is conceivable that vendors will bundle generalpurpose automatic reclustering systems with future versions of object database products, but we do not expect that such products will be able to provide efficient reclustering for physics workloads. As far as reclustering is concerned, physics analysis is too atypical to be provided for by the market. Therefore, we conclude that the HEP community will have to develop its own reclustering systems. 7 Conclusions We have discussed principles for the clustering and reclustering of HEP data. The performance graphs in this paper can be used to decide, for a given physics analysis scenario, whether
6 certain clustering techniques can be ignored without too much loss of performance, whether they need to be considered, or whether they are indispensable. We have shown performance measurements mainly for the single-client single-disk case. In additional performance tests ([6], [0]) we have verified that the techniques described above are also applicable to a system with disk and processor farming. Specifically, if a client is optimised to access the database with a good clustering efficiency, then it is possible to run many such clients concurrently, all accessing the same disk farm, without any significant performance degradation. Furthermore, the operating system will ensure that each client will get an equal share of the available disk resources. For a detailed discussion of the scalability of farming configurations with hundreds of clients, we refer the reader to [0]. References [] RD4, A Persistent Storage Manager for HEP. [] Objectivity/DB. [3] D. Baden et al., Joint DØ/CDF/CD Run II Data Management Needs Assessment, CDF/DOC/COMP UPG/PUBLIC/400, DØ Note 397, March 0, 997. [4] Grand Challenge Application on HENP Data. [] The Versant object database. [6] K. Holtman, Prototyping of CMS Storage Management, CMS NOTE/ [7] Using an object database and mass storage system for physics analysis. CERN/LHCC 97-9, The RD4 collaboration, April 997. [8] K. Holtman, P. van der Stok, I. Willers. Automatic Reclustering of Objects in Very Large Databases for High Energy Physics, Proc. of IDEAS 98, Cardiff, UK, p. 3-40, IEEE 998. [9] Reclustering Object Store Library for LHC++, V.. Available from [0] K. Holtman, J. Bunn. Scalability to Hundreds of Clients in HEP Object Databases. Proc. of CHEP 98, Chicago, USA. 6
The Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system
More informationExperiences with the Parallel Virtual File System (PVFS) in Linux Clusters
Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters Kent Milfeld, Avijit Purkayastha, Chona Guiang Texas Advanced Computing Center The University of Texas Austin, Texas USA Abstract
More information- SLED: single large expensive disk - RAID: redundant array of (independent, inexpensive) disks
RAID and AutoRAID RAID background Problem: technology trends - computers getting larger, need more disk bandwidth - disk bandwidth not riding moore s law - faster CPU enables more computation to support
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in
More informationCA485 Ray Walshe Google File System
Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage
More informationSYSTEM UPGRADE, INC Making Good Computers Better. System Upgrade Teaches RAID
System Upgrade Teaches RAID In the growing computer industry we often find it difficult to keep track of the everyday changes in technology. At System Upgrade, Inc it is our goal and mission to provide
More information2 Databases for calibration and bookkeeping purposes
Databases for High Energy Physics D. Baden University of Maryland. B. Linder ORACLE corporation R. Mount Califomia Institute of Technology J. Shiers CERN, Geneva, Switzerland. This paper will examine the
More informationStorage Devices for Database Systems
Storage Devices for Database Systems 5DV120 Database System Principles Umeå University Department of Computing Science Stephen J. Hegner hegner@cs.umu.se http://www.cs.umu.se/~hegner Storage Devices for
More informationHigh-Energy Physics Data-Storage Challenges
High-Energy Physics Data-Storage Challenges Richard P. Mount SLAC SC2003 Experimental HENP Understanding the quantum world requires: Repeated measurement billions of collisions Large (500 2000 physicist)
More informationVirtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili
Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed
More informationRAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE
RAID SEMINAR REPORT 2004 Submitted on: Submitted by: 24/09/2004 Asha.P.M NO: 612 S7 ECE CONTENTS 1. Introduction 1 2. The array and RAID controller concept 2 2.1. Mirroring 3 2.2. Parity 5 2.3. Error correcting
More informationDistributed Filesystem
Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the
More informationFile Server Comparison: Executive Summary. Microsoft Windows NT Server 4.0 and Novell NetWare 5. Contents
File Server Comparison: Microsoft Windows NT Server 4.0 and Novell NetWare 5 Contents Executive Summary Updated: October 7, 1998 (PDF version 240 KB) Executive Summary Performance Analysis Price/Performance
More informationLHCb Computing Resources: 2018 requests and preview of 2019 requests
LHCb Computing Resources: 2018 requests and preview of 2019 requests LHCb-PUB-2017-009 23/02/2017 LHCb Public Note Issue: 0 Revision: 0 Reference: LHCb-PUB-2017-009 Created: 23 rd February 2017 Last modified:
More informationCLOUD-SCALE FILE SYSTEMS
Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients
More informationFILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23
FILE SYSTEMS CS124 Operating Systems Winter 2015-2016, Lecture 23 2 Persistent Storage All programs require some form of persistent storage that lasts beyond the lifetime of an individual process Most
More informationImproving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016
Improving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016 Role of the Scheduler Optimise Access to Storage CPU operations have a few processor cycles (each cycle is < 1ns) Seek operations
More informationOptimizing Parallel Access to the BaBar Database System Using CORBA Servers
SLAC-PUB-9176 September 2001 Optimizing Parallel Access to the BaBar Database System Using CORBA Servers Jacek Becla 1, Igor Gaponenko 2 1 Stanford Linear Accelerator Center Stanford University, Stanford,
More informationOn BigFix Performance: Disk is King. How to get your infrastructure right the first time! Case Study: IBM Cloud Development - WW IT Services
On BigFix Performance: Disk is King How to get your infrastructure right the first time! Case Study: IBM Cloud Development - WW IT Services Authors: Shaun T. Kelley, Mark Leitch Abstract: Rolling out large
More informationDistributed Systems 16. Distributed File Systems II
Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS
More informationStriped Data Server for Scalable Parallel Data Analysis
Journal of Physics: Conference Series PAPER OPEN ACCESS Striped Data Server for Scalable Parallel Data Analysis To cite this article: Jin Chang et al 2018 J. Phys.: Conf. Ser. 1085 042035 View the article
More informationStorage. CS 3410 Computer System Organization & Programming
Storage CS 3410 Computer System Organization & Programming These slides are the product of many rounds of teaching CS 3410 by Deniz Altinbuke, Kevin Walsh, and Professors Weatherspoon, Bala, Bracy, and
More informationThe BABAR Database: Challenges, Trends and Projections
SLAC-PUB-9179 September 2001 The BABAR Database: Challenges, Trends and Projections I. Gaponenko 1, A. Mokhtarani 1, S. Patton 1, D. Quarrie 1, A. Adesanya 2, J. Becla 2, A. Hanushevsky 2, A. Hasan 2,
More informationExtreme Storage Performance with exflash DIMM and AMPS
Extreme Storage Performance with exflash DIMM and AMPS 214 by 6East Technologies, Inc. and Lenovo Corporation All trademarks or registered trademarks mentioned here are the property of their respective
More informationEvaluation of the computing resources required for a Nordic research exploitation of the LHC
PROCEEDINGS Evaluation of the computing resources required for a Nordic research exploitation of the LHC and Sverker Almehed, Chafik Driouichi, Paula Eerola, Ulf Mjörnmark, Oxana Smirnova,TorstenÅkesson
More informationDELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE
WHITEPAPER DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE A Detailed Review ABSTRACT While tape has been the dominant storage medium for data protection for decades because of its low cost, it is steadily
More informationCSE 124: Networked Services Fall 2009 Lecture-19
CSE 124: Networked Services Fall 2009 Lecture-19 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa09/cse124 Some of these slides are adapted from various sources/individuals including but
More information6. Results. This section describes the performance that was achieved using the RAMA file system.
6. Results This section describes the performance that was achieved using the RAMA file system. The resulting numbers represent actual file data bytes transferred to/from server disks per second, excluding
More informationEvaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades
Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades Evaluation report prepared under contract with Dot Hill August 2015 Executive Summary Solid state
More informationImplementation and Evaluation of Prefetching in the Intel Paragon Parallel File System
Implementation and Evaluation of Prefetching in the Intel Paragon Parallel File System Meenakshi Arunachalam Alok Choudhary Brad Rullman y ECE and CIS Link Hall Syracuse University Syracuse, NY 344 E-mail:
More information1Highwinds. Software. Highwinds Software LLC. Document Version 1.1 April 2003
1Highwinds H A R D W A R E S i z i n g G u i d e Document Version 1.1 April 2003 2Highwinds Intr troduc duction Managing Usenet is brutal on hardware. As of this writing, Usenet ranges from 350 to 450
More informationCSE 124: Networked Services Lecture-16
Fall 2010 CSE 124: Networked Services Lecture-16 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/23/2010 CSE 124 Networked Services Fall 2010 1 Updates PlanetLab experiments
More informationAtlas event data model optimization studies based on the use of segmented VArray in Objectivity/DB.
Atlas event data model optimization studies based on the use of segmented VArray in Objectivity/DB. S. Rolli, A. Salvadori 2,A.C. Schaffer 2 3,M. Schaller 2 4 Tufts University, Medford, MA, USA 2 CERN,
More informationWhere Have We Been? Ch. 6 Memory Technology
Where Have We Been? Combinational and Sequential Logic Finite State Machines Computer Architecture Instruction Set Architecture Tracing Instructions at the Register Level Building a CPU Pipelining Where
More informationCache introduction. April 16, Howard Huang 1
Cache introduction We ve already seen how to make a fast processor. How can we supply the CPU with enough data to keep it busy? The rest of CS232 focuses on memory and input/output issues, which are frequently
More informationPrototyping of CMS storage management
Prototyping of CMS storage management Holtman, K.J.G. DOI: 10.6100/IR533317 Published: 01/01/2000 Document Version Publisher s PDF, also known as Version of Record (includes final page, issue and volume
More informationLecture 29. Friday, March 23 CS 470 Operating Systems - Lecture 29 1
Lecture 29 Reminder: Homework 7 is due on Monday at class time for Exam 2 review; no late work accepted. Reminder: Exam 2 is on Wednesday. Exam 2 review sheet is posted. Questions? Friday, March 23 CS
More informationCSE 124: Networked Services Lecture-17
Fall 2010 CSE 124: Networked Services Lecture-17 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/30/2010 CSE 124 Networked Services Fall 2010 1 Updates PlanetLab experiments
More informationStorage. Hwansoo Han
Storage Hwansoo Han I/O Devices I/O devices can be characterized by Behavior: input, out, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections 2 I/O System Characteristics
More informationAdministrivia. CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Disks (cont.) Disks - review
Administrivia CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Homework #4 due Thursday answers posted soon after Exam #2 on Thursday, April 24 on memory hierarchy (Unit 4) and
More informationChapter 11: File System Implementation. Objectives
Chapter 11: File System Implementation Objectives To describe the details of implementing local file systems and directory structures To describe the implementation of remote file systems To discuss block
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Storage and Other I/O Topics I/O Performance Measures Types and Characteristics of I/O Devices Buses Interfacing I/O Devices
More informationDept. Of Computer Science, Colorado State University
CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [HADOOP/HDFS] Trying to have your cake and eat it too Each phase pines for tasks with locality and their numbers on a tether Alas within a phase, you get one,
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 1: Distributed File Systems GFS (The Google File System) 1 Filesystems
More informationAugust Li Qiang, Huang Qiulan, Sun Gongxing IHEP-CC. Supported by the National Natural Science Fund
August 15 2016 Li Qiang, Huang Qiulan, Sun Gongxing IHEP-CC Supported by the National Natural Science Fund The Current Computing System What is Hadoop? Why Hadoop? The New Computing System with Hadoop
More informationChapter 1: Introduction. Operating System Concepts 9 th Edit9on
Chapter 1: Introduction Operating System Concepts 9 th Edit9on Silberschatz, Galvin and Gagne 2013 Objectives To describe the basic organization of computer systems To provide a grand tour of the major
More informationCHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.
CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed. File-System Structure File structure Logical storage unit Collection of related information File
More informationGoogle File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo
Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google 2017 fall DIP Heerak lim, Donghun Koo 1 Agenda Introduction Design overview Systems interactions Master operation Fault tolerance
More informationThe Google File System
October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single
More informationChapter Seven Morgan Kaufmann Publishers
Chapter Seven Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored as a charge on capacitor (must be
More informationDistributed Video Systems Chapter 3 Storage Technologies
Distributed Video Systems Chapter 3 Storage Technologies Jack Yiu-bun Lee Department of Information Engineering The Chinese University of Hong Kong Contents 3.1 Introduction 3.2 Magnetic Disks 3.3 Video
More informationCSE380 - Operating Systems
CSE380 - Operating Systems Notes for Lecture 17-11/10/05 Matt Blaze, Micah Sherr (some examples by Insup Lee) Implementing File Systems We ve looked at the user view of file systems names, directory structure,
More informationFuture File System: An Evaluation
Future System: An Evaluation Brian Gaffey and Daniel J. Messer, Cray Research, Inc., Eagan, Minnesota, USA ABSTRACT: Cray Research s file system, NC1, is based on an early System V technology. Cray has
More informationWhite paper ETERNUS CS800 Data Deduplication Background
White paper ETERNUS CS800 - Data Deduplication Background This paper describes the process of Data Deduplication inside of ETERNUS CS800 in detail. The target group consists of presales, administrators,
More informationThe Google File System
The Google File System By Ghemawat, Gobioff and Leung Outline Overview Assumption Design of GFS System Interactions Master Operations Fault Tolerance Measurements Overview GFS: Scalable distributed file
More informationDatabase Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu
Database Architecture 2 & Storage Instructor: Matei Zaharia cs245.stanford.edu Summary from Last Time System R mostly matched the architecture of a modern RDBMS» SQL» Many storage & access methods» Cost-based
More informationIntroduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006
November 21, 2006 The memory hierarchy Red = Level Access time Capacity Features Registers nanoseconds 100s of bytes fixed Cache nanoseconds 1-2 MB fixed RAM nanoseconds MBs to GBs expandable Disk milliseconds
More informationNPTEL Course Jan K. Gopinath Indian Institute of Science
Storage Systems NPTEL Course Jan 2012 (Lecture 39) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,
More informationCS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.
CS 138: Google CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Google Environment Lots (tens of thousands) of computers all more-or-less equal - processor, disk, memory, network interface
More informationL9: Storage Manager Physical Data Organization
L9: Storage Manager Physical Data Organization Disks and files Record and file organization Indexing Tree-based index: B+-tree Hash-based index c.f. Fig 1.3 in [RG] and Fig 2.3 in [EN] Functional Components
More informationMicrosoft Office SharePoint Server 2007
Microsoft Office SharePoint Server 2007 Enabled by EMC Celerra Unified Storage and Microsoft Hyper-V Reference Architecture Copyright 2010 EMC Corporation. All rights reserved. Published May, 2010 EMC
More informationAchieving Horizontal Scalability. Alain Houf Sales Engineer
Achieving Horizontal Scalability Alain Houf Sales Engineer Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches
More informationDefinition of RAID Levels
RAID The basic idea of RAID (Redundant Array of Independent Disks) is to combine multiple inexpensive disk drives into an array of disk drives to obtain performance, capacity and reliability that exceeds
More informationAn SQL-based approach to physics analysis
Journal of Physics: Conference Series OPEN ACCESS An SQL-based approach to physics analysis To cite this article: Dr Maaike Limper 2014 J. Phys.: Conf. Ser. 513 022022 View the article online for updates
More informationCHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang
CHAPTER 6 Memory 6.1 Memory 233 6.2 Types of Memory 233 6.3 The Memory Hierarchy 235 6.3.1 Locality of Reference 237 6.4 Cache Memory 237 6.4.1 Cache Mapping Schemes 239 6.4.2 Replacement Policies 247
More informationSpecifying Storage Servers for IP security applications
Specifying Storage Servers for IP security applications The migration of security systems from analogue to digital IP based solutions has created a large demand for storage servers high performance PCs
More informationAutomated Storage Tiering on Infortrend s ESVA Storage Systems
Automated Storage Tiering on Infortrend s ESVA Storage Systems White paper Abstract This white paper introduces automated storage tiering on Infortrend s ESVA storage arrays. Storage tiering can generate
More informationDatabase Management Systems, 2nd edition, Raghu Ramakrishnan, Johannes Gehrke, McGraw-Hill
Lecture Handout Database Management System Lecture No. 34 Reading Material Database Management Systems, 2nd edition, Raghu Ramakrishnan, Johannes Gehrke, McGraw-Hill Modern Database Management, Fred McFadden,
More informationPlot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;
How will execution time grow with SIZE? int array[size]; int A = ; for (int i = ; i < ; i++) { for (int j = ; j < SIZE ; j++) { A += array[j]; } TIME } Plot SIZE Actual Data 45 4 5 5 Series 5 5 4 6 8 Memory
More informationCSE 4/521 Introduction to Operating Systems. Lecture 23 File System Implementation II (Allocation Methods, Free-Space Management) Summer 2018
CSE 4/521 Introduction to Operating Systems Lecture 23 File System Implementation II (Allocation Methods, Free-Space Management) Summer 2018 Overview Objective: To discuss how the disk is managed for a
More informationGoogle is Really Different.
COMP 790-088 -- Distributed File Systems Google File System 7 Google is Really Different. Huge Datacenters in 5+ Worldwide Locations Datacenters house multiple server clusters Coming soon to Lenior, NC
More informationCMS Note Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland
Available on CMS information server CMS NOTE 2001/037 The Compact Muon Solenoid Experiment CMS Note Mailing address: CMS CERN, CH-1211 GENEVA 23, Switzerland July 16, 2001 CMS Data Grid System Overview
More informationChapter 5. Large and Fast: Exploiting Memory Hierarchy
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to
More informationOracle EXAM - 1Z Oracle Exadata Database Machine Administration, Software Release 11.x Exam. Buy Full Product
Oracle EXAM - 1Z0-027 Oracle Exadata Database Machine Administration, Software Release 11.x Exam Buy Full Product http://www.examskey.com/1z0-027.html Examskey Oracle 1Z0-027 exam demo product is here
More informationTechnical papers Web caches
Technical papers Web caches Web caches What is a web cache? In their simplest form, web caches store temporary copies of web objects. They are designed primarily to improve the accessibility and availability
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung December 2003 ACM symposium on Operating systems principles Publisher: ACM Nov. 26, 2008 OUTLINE INTRODUCTION DESIGN OVERVIEW
More informationStoring Data: Disks and Files
Storing Data: Disks and Files Chapter 7 (2 nd edition) Chapter 9 (3 rd edition) Yea, from the table of my memory I ll wipe away all trivial fond records. -- Shakespeare, Hamlet Database Management Systems,
More informationINTRODUCTION TO THE ANAPHE/LHC++ SOFTWARE SUITE
INTRODUCTION TO THE ANAPHE/LHC++ SOFTWARE SUITE Andreas Pfeiffer CERN, Geneva, Switzerland Abstract The Anaphe/LHC++ project is an ongoing effort to provide an Object-Oriented software environment for
More informationData preservation for the HERA experiments at DESY using dcache technology
Journal of Physics: Conference Series PAPER OPEN ACCESS Data preservation for the HERA experiments at DESY using dcache technology To cite this article: Dirk Krücker et al 2015 J. Phys.: Conf. Ser. 66
More informationChapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition
Chapter 10: Mass-Storage Systems Silberschatz, Galvin and Gagne 2013 Objectives To describe the physical structure of secondary storage devices and its effects on the uses of the devices To explain the
More informationIBM and HP 6-Gbps SAS RAID Controller Performance
IBM and HP 6-Gbps SAS RAID Controller Performance Evaluation report prepared under contract with IBM Corporation Introduction With increasing demands on storage in popular application servers, the server
More informationGoogle File System. Arun Sundaram Operating Systems
Arun Sundaram Operating Systems 1 Assumptions GFS built with commodity hardware GFS stores a modest number of large files A few million files, each typically 100MB or larger (Multi-GB files are common)
More informationOperating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017
Operating Systems Lecture 7.2 - File system implementation Adrien Krähenbühl Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Design FAT or indexed allocation? UFS, FFS & Ext2 Journaling with Ext3
More informationMladen Stefanov F48235 R.A.I.D
R.A.I.D Data is the most valuable asset of any business today. Lost data, in most cases, means lost business. Even if you backup regularly, you need a fail-safe way to ensure that your data is protected
More informationSession: Hardware Topic: Disks. Daniel Chang. COP 3502 Introduction to Computer Science. Lecture. Copyright August 2004, Daniel Chang
Lecture Session: Hardware Topic: Disks Daniel Chang Basic Components CPU I/O Devices RAM Operating System Disks Considered I/O devices Used to hold data and programs before they are loaded to memory and
More information!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced?
Chapter 10: Virtual Memory Questions? CSCI [4 6] 730 Operating Systems Virtual Memory!! What is virtual memory and when is it useful?!! What is demand paging?!! When should pages in memory be replaced?!!
More informationUsing Synology SSD Technology to Enhance System Performance Synology Inc.
Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_WP_ 20121112 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD
More informationClient Server & Distributed System. A Basic Introduction
Client Server & Distributed System A Basic Introduction 1 Client Server Architecture A network architecture in which each computer or process on the network is either a client or a server. Source: http://webopedia.lycos.com
More informationPROOF-Condor integration for ATLAS
PROOF-Condor integration for ATLAS G. Ganis,, J. Iwaszkiewicz, F. Rademakers CERN / PH-SFT M. Livny, B. Mellado, Neng Xu,, Sau Lan Wu University Of Wisconsin Condor Week, Madison, 29 Apr 2 May 2008 Outline
More informationCSCI-GA Database Systems Lecture 8: Physical Schema: Storage
CSCI-GA.2433-001 Database Systems Lecture 8: Physical Schema: Storage Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com View 1 View 2 View 3 Conceptual Schema Physical Schema 1. Create a
More informationChapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY
Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored
More informationComputer Science 146. Computer Architecture
Computer Science 46 Computer Architecture Spring 24 Harvard University Instructor: Prof dbrooks@eecsharvardedu Lecture 22: More I/O Computer Science 46 Lecture Outline HW5 and Project Questions? Storage
More informationLecture 2: Memory Systems
Lecture 2: Memory Systems Basic components Memory hierarchy Cache memory Virtual Memory Zebo Peng, IDA, LiTH Many Different Technologies Zebo Peng, IDA, LiTH 2 Internal and External Memories CPU Date transfer
More informationAdvanced Database Systems
Lecture II Storage Layer Kyumars Sheykh Esmaili Course s Syllabus Core Topics Storage Layer Query Processing and Optimization Transaction Management and Recovery Advanced Topics Cloud Computing and Web
More informationCondusiv s V-locity VM Accelerates Exchange 2010 over 60% on Virtual Machines without Additional Hardware
openbench Labs Executive Briefing: March 13, 2013 Condusiv s V-locity VM Accelerates Exchange 2010 over 60% on Virtual Machines without Additional Hardware Optimizing I/O for Increased Throughput and Reduced
More information! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like
Cloud background Google File System! Warehouse scale systems " 10K-100K nodes " 50MW (1 MW = 1,000 houses) " Power efficient! Located near cheap power! Passive cooling! Power Usage Effectiveness = Total
More informationFuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc
Fuxi Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc {jiamang.wang, yongjun.wyj, hua.caihua, zhipeng.tzp, zhiqiang.lv,
More informationNEC Express5800 A2040b 22TB Data Warehouse Fast Track. Reference Architecture with SW mirrored HGST FlashMAX III
NEC Express5800 A2040b 22TB Data Warehouse Fast Track Reference Architecture with SW mirrored HGST FlashMAX III Based on Microsoft SQL Server 2014 Data Warehouse Fast Track (DWFT) Reference Architecture
More informationQemacrs 9+> Presented at the HPCN 95 Conference, Milan, May Experience of running PIAF on the CS-2 at CERN CERN-CN cam ub1za1z1es, GENEVA
GQ Qemacrs 9+> cum-comput1nc AND NETWORKS D1v1s1oN CN! 95/6 March 1995 OCR Output cam ub1za1z1es, GENEVA CERN-CN-95-06 Experience of running PIAF on the CS-2 at CERN Timo Hakulinen 8: Pons Rademakers Presented
More informationPowerVault MD3 SSD Cache Overview
PowerVault MD3 SSD Cache Overview A Dell Technical White Paper Dell Storage Engineering October 2015 A Dell Technical White Paper TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS
More information