NEXPReS WP8 Progress Meeting

Size: px
Start display at page:

Download "NEXPReS WP8 Progress Meeting"

Transcription

1 NEXPReS WP8 Progress Meeting Preparations for Multi Site Testing Ari Mujunen Aalto University Metsähovi Radio Observatory VLBI2SKA Workshop in Aveiro, Portugal, Research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/ ) under grant agreement n RI This presentation reflects only the author's views. The European Union is not liable for any use that may be made of the information contained therein.

2 Partner Focus Areas AALTO Coordination, basic technologies ASTRON Long-term archival & reprocessing INAF Global/local allocation/deallocation schemes JIVE Augmenting correlation capabilities /w buffering OSO Trial-site performance & applicability testing PSNC Role & trials of computing center buffering UMAN Trial-site performance & applicability testing 2 (44)

3 WP8 Objective Determine the best practical mix of solutions What kind of storage Where located & packaged Connected in which ways How storage is allocated/deallocated and accessed Which will serve the needs of evolving (>1Gbps) VLBI data acquisition and processing 3 (44)

4 WP8 Schedule 4 (44)

5 D8.1 Interface design document of storage element API Delivered in December 2010 Main results: One buffer one computer, many disks Control API through SSH Simple SSH-protected CLI in/out, potentially opening up UDP/TCP ports in public 10GE Simple (UDP-based/Tsunami-like) transfer protocol Separation of data and metadata Though no sw cleanup of native data formats of data sources (Mk4 timecode, VDIF headers, UDP packet numbers,...) Data chunks in memory and on individual disks 5 (44)

6 D8.2: Hardware design document for simultaneous I/O storage elements For the highest capacity and bandwidth at the lowest cost, 3.5 hard disks are still (for a while) the best match To fill one 10GE port we need disks We presented two alternative designs 6 (44)

7 Current Issues MS801: Hardware decision for subsequent prototyping and integration tests D8.2 alternative solution with server motherboard and CPUs selected D8.3: Design document of storage element allocation methods Due ~now... Should describe how to manage buffer space across stations Could describe how SSH keys (or WP5 sshbroker?) are used to allow cross-access 7 (44)

8 Multi Station Tests are Scheduled for Summer 2011 Striving to deliver 21-Sep-2011 as scheduled Although it is a milestone, not a deliverable... Mh, On, Jb JIVE D8.4: Design document of transparent local/remote application programming interface for storage elements MS802: Review of prototype test results, decision of the architecture and scope of subsequent integration tests 8 (44)

9 Platform Selection 9 (44)

10 Platform Selection in D8.2 Two alternatives: Very basic COTS chassis with customization COTS chassis (more expensive) 10 (44)

11 Lowest-Cost Solution by Customized Chassis Suggested a basic 4U rack enclosure Chieftec UNC-410F-B 11 (44)

12 Customized Chassis Was a Great Idea, but... Wiring needs more work than predicted 12 (44)

13 Adding ~+30% to Cost: 2*Intel Xeon QP More disks (36) 2*1400W Uses 420W Neat, /w 3yr warranty 13 (44)

14 Final Recommendation as MS801 We ended up using D8.2 alternative solution and we also chose a (potentially?) more powerful motherboard/cpu from the same vendor: This fulfills the recommendations in D8.2 of COTS implementation of 8Gbps capable 36 disk storage unit in a fairly standard 4U rack enclosure 72TB or 20 hours of 8Gbps Add a heavy-duty 2x4-core MB with 10GE and disk controllers With room for expansion with external disk controllers, cf. the recent Mark6 proposal 14 (44)

15 Parts List 15 (44)

16 36 Disks + 7 (low-profile) PCIe x8 slots Still only 4U 16 (44)

17 D8.2 Recommendation OK, too Though now recommending the SC disk case... Asus Crosshair IV Formula (+846TQ) 3 PCIe x8 16+6=22 disks 10GE No fans Extreme (+846A) 4 PCI x8 24(+6) disks 10GE Fans, Hydralogix 17 (44)

18 Pictures from Mh 18 (44)

19 Linux Environment, Software Algorithms 19 (44)

20 Linux Environment 64-bit Ubuntu/Debian server booted from network recommended 64-bit: 16GB (and up; >4GB) needed for latency buffers Ubuntu/Debian: familiarity within our community Server: avoiding extra services (X, net) in storage nodes Netboot: central management of (potentially) multiple storage nodes; maximize the number of data disks Netboot via the motherboard 1GE interfaces Local.vlbi VLAN for that while 10GE connected to Internet/LP 20 (44)

21 Netboot Server TFTP+NFS small server 21 (44)

22 Lessons Learned from EXPReS 6Gbps/20-disk RAID0 Total throughput bottlenecked in a single kernel thread flushing all data from block buffer cache to the single raid0 Tried avoiding the buffer cache with O_DIRECT, got abysmal performance in EXPReS trial tests Did not understand the importance of O_DIRECT writing large blocks in a single write() (2 64MB or more) Substituted two raid0s; still all CPU being consumed in copying data from user programs to buffer cache Giving each disk spindle its own process/thread doing O_DIRECT write()s with large enough block size seems fine With 22 disks attained >17Gbps using only 20% of CPUs 22 (44)

23 RAID Arrays? For high bandwidth and high capacity at the same time, one cannot really use other RAIDs but raid0 (striping) Raid5/Raid6 slow down in streaming writing, especially in degraded mode (with failed disks) False illusion of guard against disk failures Cannot afford losing half the disk space with raid10 But raid0 is vulnerable to single-disk failures Why use any RAIDs at all for astronomy data? Why use any form of redundancy for volatile, non-archival data? 23 (44)

24 Writing to a set of disks Divide incoming data stream into consecutive largish (50 500MB) chunks of memory buffer Write chunks into files on a set of stand-alone disks Local or even remote disks Disk handler processes/threads 24 (44)

25 Naming Files on Disks Prefer naming conventions which ls-list alphabetically in time/sequence order Determining file name list in correct order becomes as easy as 'ls /*/<experiment>/*'... Detecting missing files or holes For example: /d001/n11k2mh/ m5a /d001/n11k2mh/ m5a 25 (44)

26 Managing Local Disk Space Initially thought writing chunks in round-robin fashion The current idea is to select among idle disks the one with the largest amount of free space Round-robin (or longest idle time?) for disks with the same (largest) amount of free space Balances disk consumption automatically, allows different disk sizes & usage levels Also allows subsets of disks Tracking the number of idle disks & idle time gives an idea of how much headroom still left 26 (44)

27 Minimizing CPU Usage & Memory Bandwidth At 10Gbps rates, avoiding unnecessary memory copies becomes essential to avoid exhausting CPU & memory Current memory bandwidth is only 30 60Gbps in total The standard Linux disk r/w model makes one extra memcpy() between user mode and kernel buffer cache Must use O_DIRECT with large enough transfer sizes to allow disk drives to queue large enough DMA request / NCQ queues By default the network stack makes another memcpy() when adding/removing IP packet headers Can be circumvented with splice() or raw packet ring buffer 27 (44)

28 Networking CPU Overhead Planning to tolerate one memcpy() with the simplest UDP packet processing Though avoiding IP sw checksumming & fragmentation Retaining user headers (like packet #) Assuming UDP mostly in order ; attempting UDP receive at the assumed packet location and copying out-of-order packets back If zero-copy needed PacketShader raw ring buffer driver Though dictates using the latest Intel cards... mmap()ed PCAP raw ring buffer turned out to be one memcpy() plus regular IP stack processing (44)

29 10GE Cards Now recommending latest Intel cards PacketShader raw ring buffer driver compatible 29 (44)

30 Reading from a set of disks Initiate preloading of memory buffers from a set of disks Once all disks in a set have had a chance to preload, consuming the streaming data can commence at full rate 30 (44)

31 Simultaneous Read and Write Two problems, can be solved with interleaving: HDD seek time, slows down using more than one spot of disk Even without, double data streaming bandwidth required throughout the internal data paths Duration of one sequential r/w of the order of ms, >>10ms, one disk rotation 31 (44)

32 Simultaneous Read and Write Avoiding hard disk 8 12ms seek times Must stream in either read or write direction for much longer than seek time (or full disk rotation time, approx ms) E.g. read streaming for about 120ms gives the data from 10 rotations, then wastes the time of only one rotation, 90% efficiency So nothing really fancy is required but: Usually simultaneous read and write will halve the total bandwidth If the bottleneck at one subsystem can be removed (e.g. hard disks SSDs), another bottleneck will surface in another subsystem Memory and/or disk controller (PCIe) bandwidth limits are the most probable bottleneck candidates 32 (44)

33 Error Recovery If one disk handler does not complete in allotted time, it can hand over the memory block to another hander A spare disk Or even to another disk in the current set, provided there is enough slack time in the set, so it can eventually catch up Disk handler processes/threads Testing/recovery of failed disk Spare 33 (44)

34 Read Error Recovery Failure to read a chunk file from one disk causes the loss of a well-defined (0.1 1sec) time interval in the data stream In worst case (a total disk failure) the loss repeats at n*(0.1 1s) intervals Can adjust read retry strategy according to whether the reader/consumer can wait or it need quasi-realtime data 34 (44)

35 Turning Recordings into Archives If some time after a recording is made it is found that It needs to be retained/archived for a longer period of time And there is spare disk capacity available And there is some idle time for the data disk set available Then redundancy (aka raid5/raid6) can be calculated afterwards (file based) and the ECC information stored onto the spare disks If later a disk of the disk set fails, it can be replaced and its content files restored by recovering from the ECC files 35 (44)

36 Multi Site Test Setup 36 (44)

37 Test Cases We divide the test into two cases: Receiving data to the local buffer Transferring data from local buffer to remote buffer/processing Later, if this works, we can test doing thee two simultaneously == simultaneous buffering Possibly even multiple concurrent reads? To support multiple (slower) reads by several SFX instances Divide main memory roughly in half for write / read Memory chunk == file size, half of memory divided by # of disks Divide read memory chunks to concurrent readers, minimum of 2 3 chunks per reader 37 (44)

38 Boundary Conditions Although we talked about transparent access in DoW, Transparent remote writing over the network makes very little sense As it is effectively real-time e-vlbi, defeating the very purpose of buffering, namely independence of network performance Transparent remote reading looks fine, esp. with sw correlators Thus most of the buffer disk space will be located at stations, close to data sources Fits the practical economic model, too: nobody is going to buy disk space needed by a given station except the station itself... Transparent teleporting of data from one buffer to another rare (contingency?) 38 (44)

39 More Boundary Conditions Our available data sources: Test 10GE Linux PC: dummy data up to 9Gbps Could devise something to generate artificial noise data for testing? ibobs: 8Gbps UDP packet stream (1Gsps/8-bit) Not readily correlatable (nor super-interesting, too many bits/sample) Though the only real data source readily available Jivemk5 software: numbered UDP packets up to 512/896Mbps via the std Mk5 1GE Or up to 1024Mbps via an additional Mk5 10Ge card... Unlikely that JIVE can have lots of disks in a buffer machine already during summer But could use JIVE Mk5s to access station buffers (Mh, On, Jb) and feed the data to Either SFX or the Mk4 correlator 39 (44)

40 Summer Test Scenario Due to boundary conditions, combining high data rate and buffered correlation seems unlikely this summer Thus recording 22GHz UDP packet stream from Jivemk5 locally and correlating that with SFX (with UDT or even TCP remote access) seems like the only viable real correlation option Could spice that up with recording more scans while SFX is reading the previous scan(s)... Even better if there could be multiple SFX instances running concurrently, from different locations, processing several past scans Any way to show high-rate performance? >1Gbps remote reads Mh<->Jb<->On? 40 (44)

41 What Do I Have to Do Next? Test partners need a similar test system 36, 24, or 22 disks + 10GE Install Linux to the test system Preferably with a small ( leftover PC/HP Microserver) TFTP+NFS netboot server Check suitable schedule for 10GE test traffic with your operator (or WP6?) Agree test schedule with partners 41 (44)

42 Tasks 1 AALTO Create UDP stream writer + UDT(/TCP) reader sw plus Debianbased installation for OSO, UMAN JIVE Remote UDT(/TCP) reader component for SFX OSO, UMAN Acquire test systems (36/24/22/6? disks) 42 (44)

43 Tasks 2 PSNC Role & trials of computing center buffering Could we run SFX from there, accessing station buffers over the network? Could that be combined with a contingency transfer of some observation from Mh/On/Jb to PSNC buffer disks? INAF Global/local allocation/deallocation schemes The deliverable D8.3 ASTRON Long-term archival & reprocessing? Start scheduled for (44)

44 The Inconvenient Truths :-) About networked data streaming: A given station cannot really sustain recording bandwidth larger than their network connectivity unless given an unlimited disk buffer or long enough breaks between recordings. For the VLBI case: A single slow (or high-latency, like shipping) connection in a given e-vlbi network will force others (or some buffering party) to buffer most of the VLBI data of the whole network, if not all. 44 (44)

WP7. Mark Kettenis April 18, Joint Institute for VLBI in Europe

WP7. Mark Kettenis April 18, Joint Institute for VLBI in Europe WP7 Mark Kettenis kettenis@jive.nl Joint Institute for VLBI in Europe April 18, 2011 Mark Kettenis kettenis@jive.nl (JIVE) NEXPReS WP7 April 18, 2011 1 / 16 Goals Create an automated, distributed correlator

More information

CSE 380 Computer Operating Systems

CSE 380 Computer Operating Systems CSE 380 Computer Operating Systems Instructor: Insup Lee University of Pennsylvania Fall 2003 Lecture Note on Disk I/O 1 I/O Devices Storage devices Floppy, Magnetic disk, Magnetic tape, CD-ROM, DVD User

More information

CA485 Ray Walshe Google File System

CA485 Ray Walshe Google File System Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage

More information

Storage. Hwansoo Han

Storage. Hwansoo Han Storage Hwansoo Han I/O Devices I/O devices can be characterized by Behavior: input, out, storage Partner: human or machine Data rate: bytes/sec, transfers/sec I/O bus connections 2 I/O System Characteristics

More information

I/O CANNOT BE IGNORED

I/O CANNOT BE IGNORED LECTURE 13 I/O I/O CANNOT BE IGNORED Assume a program requires 100 seconds, 90 seconds for main memory, 10 seconds for I/O. Assume main memory access improves by ~10% per year and I/O remains the same.

More information

CSE380 - Operating Systems. Communicating with Devices

CSE380 - Operating Systems. Communicating with Devices CSE380 - Operating Systems Notes for Lecture 15-11/4/04 Matt Blaze (some examples by Insup Lee) Communicating with Devices Modern architectures support convenient communication with devices memory mapped

More information

CS5460: Operating Systems Lecture 20: File System Reliability

CS5460: Operating Systems Lecture 20: File System Reliability CS5460: Operating Systems Lecture 20: File System Reliability File System Optimizations Modern Historic Technique Disk buffer cache Aggregated disk I/O Prefetching Disk head scheduling Disk interleaving

More information

PC-based data acquisition II

PC-based data acquisition II FYS3240 PC-based instrumentation and microcontrollers PC-based data acquisition II Data streaming to a storage device Spring 2015 Lecture 9 Bekkeng, 29.1.2015 Data streaming Data written to or read from

More information

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Operating Systems Lecture 7.2 - File system implementation Adrien Krähenbühl Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Design FAT or indexed allocation? UFS, FFS & Ext2 Journaling with Ext3

More information

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed

More information

Mass-Storage Structure

Mass-Storage Structure Operating Systems (Fall/Winter 2018) Mass-Storage Structure Yajin Zhou (http://yajin.org) Zhejiang University Acknowledgement: some pages are based on the slides from Zhi Wang(fsu). Review On-disk structure

More information

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University I/O, Disks, and RAID Yi Shi Fall 2017 Xi an Jiaotong University Goals for Today Disks How does a computer system permanently store data? RAID How to make storage both efficient and reliable? 2 What does

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Storage and Other I/O Topics I/O Performance Measures Types and Characteristics of I/O Devices Buses Interfacing I/O Devices

More information

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition Chapter 10: Mass-Storage Systems Silberschatz, Galvin and Gagne 2013 Chapter 10: Mass-Storage Systems Overview of Mass Storage Structure Disk Structure Disk Attachment Disk Scheduling Disk Management Swap-Space

More information

Specifying Storage Servers for IP security applications

Specifying Storage Servers for IP security applications Specifying Storage Servers for IP security applications The migration of security systems from analogue to digital IP based solutions has created a large demand for storage servers high performance PCs

More information

Presented by: Nafiseh Mahmoudi Spring 2017

Presented by: Nafiseh Mahmoudi Spring 2017 Presented by: Nafiseh Mahmoudi Spring 2017 Authors: Publication: Type: ACM Transactions on Storage (TOS), 2016 Research Paper 2 High speed data processing demands high storage I/O performance. Flash memory

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY HAYSTACK OBSERVATORY WESTFORD, MASSACHUSETTS May 2011

MASSACHUSETTS INSTITUTE OF TECHNOLOGY HAYSTACK OBSERVATORY WESTFORD, MASSACHUSETTS May 2011 MASSACHUSETTS INSTITUTE OF TECHNOLOGY HAYSTACK OBSERVATORY WESTFORD, MASSACHUSETTS 01886 27 May 2011 Telephone: 781-981-5400 Fax: 781-981-0590 TO: VLBI2010/mm-VLBI development group FROM: Alan R. Whitney,

More information

Storage Systems. Storage Systems

Storage Systems. Storage Systems Storage Systems Storage Systems We already know about four levels of storage: Registers Cache Memory Disk But we've been a little vague on how these devices are interconnected In this unit, we study Input/output

More information

Analysis of high capacity storage systems for e-vlbi

Analysis of high capacity storage systems for e-vlbi Analysis of high capacity storage systems for e-vlbi Matteo Stagni - Francesco Bedosti - Mauro Nanni May 21, 212 IRA 458/12 Abstract The objective of the analysis is to verify if the storage systems now

More information

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed ASPERA HIGH-SPEED TRANSFER Moving the world s data at maximum speed ASPERA HIGH-SPEED FILE TRANSFER Aspera FASP Data Transfer at 80 Gbps Elimina8ng tradi8onal bo

More information

Chapter 10: Mass-Storage Systems

Chapter 10: Mass-Storage Systems Chapter 10: Mass-Storage Systems Silberschatz, Galvin and Gagne 2013 Chapter 10: Mass-Storage Systems Overview of Mass Storage Structure Disk Structure Disk Attachment Disk Scheduling Disk Management Swap-Space

More information

Mass-Storage Structure

Mass-Storage Structure CS 4410 Operating Systems Mass-Storage Structure Summer 2011 Cornell University 1 Today How is data saved in the hard disk? Magnetic disk Disk speed parameters Disk Scheduling RAID Structure 2 Secondary

More information

Operating Systems 2010/2011

Operating Systems 2010/2011 Operating Systems 2010/2011 Input/Output Systems part 2 (ch13, ch12) Shudong Chen 1 Recap Discuss the principles of I/O hardware and its complexity Explore the structure of an operating system s I/O subsystem

More information

Dell EMC SCv3020 7,000 Mailbox Exchange 2016 Resiliency Storage Solution using 7.2K drives

Dell EMC SCv3020 7,000 Mailbox Exchange 2016 Resiliency Storage Solution using 7.2K drives Dell EMC SCv3020 7,000 Mailbox Exchange 2016 Resiliency Storage Solution using 7.2K drives Microsoft ESRP 4.0 Abstract This document describes the Dell EMC SCv3020 storage solution for Microsoft Exchange

More information

Guifré Molera, J. Ritakari & J. Wagner Metsähovi Radio Observatory, TKK - Finland

Guifré Molera, J. Ritakari & J. Wagner Metsähovi Radio Observatory, TKK - Finland European 4-Gbps VLBI and evlbi Click to edit Master subtitle style Guifré Molera, J. Ritakari & J. Wagner Observatory, TKK - Finland 1 Outline Introduction Goals for 4 Gbps Testing Results Product Conclusions

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung December 2003 ACM symposium on Operating systems principles Publisher: ACM Nov. 26, 2008 OUTLINE INTRODUCTION DESIGN OVERVIEW

More information

I/O CANNOT BE IGNORED

I/O CANNOT BE IGNORED LECTURE 13 I/O I/O CANNOT BE IGNORED Assume a program requires 100 seconds, 90 seconds for main memory, 10 seconds for I/O. Assume main memory access improves by ~10% per year and I/O remains the same.

More information

Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication

Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication CDS and Sky Tech Brief Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication Actifio recommends using Dedup-Async Replication (DAR) for RPO of 4 hours or more and using StreamSnap for

More information

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 21: Network Protocols (and 2 Phase Commit)

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 21: Network Protocols (and 2 Phase Commit) CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2003 Lecture 21: Network Protocols (and 2 Phase Commit) 21.0 Main Point Protocol: agreement between two parties as to

More information

Mladen Stefanov F48235 R.A.I.D

Mladen Stefanov F48235 R.A.I.D R.A.I.D Data is the most valuable asset of any business today. Lost data, in most cases, means lost business. Even if you backup regularly, you need a fail-safe way to ensure that your data is protected

More information

CS3600 SYSTEMS AND NETWORKS

CS3600 SYSTEMS AND NETWORKS CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 11: File System Implementation Prof. Alan Mislove (amislove@ccs.neu.edu) File-System Structure File structure Logical storage unit Collection

More information

The Google File System (GFS)

The Google File System (GFS) 1 The Google File System (GFS) CS60002: Distributed Systems Antonio Bruto da Costa Ph.D. Student, Formal Methods Lab, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur 2 Design constraints

More information

CSCI-GA Operating Systems. I/O : Disk Scheduling and RAID. Hubertus Franke

CSCI-GA Operating Systems. I/O : Disk Scheduling and RAID. Hubertus Franke CSCI-GA.2250-001 Operating Systems I/O : Disk Scheduling and RAID Hubertus Franke frankeh@cs.nyu.edu Disks Scheduling Abstracted by OS as files A Conventional Hard Disk (Magnetic) Structure Hard Disk

More information

Mass-Storage. ICS332 - Fall 2017 Operating Systems. Henri Casanova

Mass-Storage. ICS332 - Fall 2017 Operating Systems. Henri Casanova Mass-Storage ICS332 - Fall 2017 Operating Systems Henri Casanova (henric@hawaii.edu) Magnetic Disks! Magnetic disks (a.k.a. hard drives ) are (still) the most common secondary storage devices today! They

More information

Chapter 6 - External Memory

Chapter 6 - External Memory Chapter 6 - External Memory Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 6 - External Memory 1 / 66 Table of Contents I 1 Motivation 2 Magnetic Disks Write Mechanism Read Mechanism

More information

Mark 6 Next-Generation VLBI Data System

Mark 6 Next-Generation VLBI Data System Mark 6 Next-Generation VLBI Data System, IVS 2012 General Meeting Proceedings, p.86 90 http://ivscc.gsfc.nasa.gov/publications/gm2012/whitney.pdf Mark 6 Next-Generation VLBI Data System Alan Whitney, David

More information

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568 FLAT DATACENTER STORAGE Paper-3 Presenter-Pratik Bhatt fx6568 FDS Main discussion points A cluster storage system Stores giant "blobs" - 128-bit ID, multi-megabyte content Clients and servers connected

More information

CSE 120. Overview. July 27, Day 8 Input/Output. Instructor: Neil Rhodes. Hardware. Hardware. Hardware

CSE 120. Overview. July 27, Day 8 Input/Output. Instructor: Neil Rhodes. Hardware. Hardware. Hardware CSE 120 July 27, 2006 Day 8 Input/Output Instructor: Neil Rhodes How hardware works Operating Systems Layer What the kernel does API What the programmer does Overview 2 Kinds Block devices: read/write

More information

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010 Moneta: A High-performance Storage Array Architecture for Nextgeneration, Non-volatile Memories Micro 2010 NVM-based SSD NVMs are replacing spinning-disks Performance of disks has lagged NAND flash showed

More information

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google 2017 fall DIP Heerak lim, Donghun Koo 1 Agenda Introduction Design overview Systems interactions Master operation Fault tolerance

More information

CSE 124: Networked Services Lecture-16

CSE 124: Networked Services Lecture-16 Fall 2010 CSE 124: Networked Services Lecture-16 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/23/2010 CSE 124 Networked Services Fall 2010 1 Updates PlanetLab experiments

More information

SYSTEM UPGRADE, INC Making Good Computers Better. System Upgrade Teaches RAID

SYSTEM UPGRADE, INC Making Good Computers Better. System Upgrade Teaches RAID System Upgrade Teaches RAID In the growing computer industry we often find it difficult to keep track of the everyday changes in technology. At System Upgrade, Inc it is our goal and mission to provide

More information

Physical Representation of Files

Physical Representation of Files Physical Representation of Files A disk drive consists of a disk pack containing one or more platters stacked like phonograph records. Information is stored on both sides of the platter. Each platter is

More information

Iomega REV Drive Data Transfer Performance

Iomega REV Drive Data Transfer Performance Technical White Paper March 2004 Iomega REV Drive Data Transfer Performance Understanding Potential Transfer Rates and Factors Affecting Throughput Introduction Maximum Sustained Transfer Rate Burst Transfer

More information

V. Mass Storage Systems

V. Mass Storage Systems TDIU25: Operating Systems V. Mass Storage Systems SGG9: chapter 12 o Mass storage: Hard disks, structure, scheduling, RAID Copyright Notice: The lecture notes are mainly based on modifications of the slides

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE RAID SEMINAR REPORT 2004 Submitted on: Submitted by: 24/09/2004 Asha.P.M NO: 612 S7 ECE CONTENTS 1. Introduction 1 2. The array and RAID controller concept 2 2.1. Mirroring 3 2.2. Parity 5 2.3. Error correcting

More information

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades Evaluation report prepared under contract with Dot Hill August 2015 Executive Summary Solid state

More information

Best Practices for Setting BIOS Parameters for Performance

Best Practices for Setting BIOS Parameters for Performance White Paper Best Practices for Setting BIOS Parameters for Performance Cisco UCS E5-based M3 Servers May 2013 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page

More information

I/O Management Software. Chapter 5

I/O Management Software. Chapter 5 I/O Management Software Chapter 5 1 Learning Outcomes An understanding of the structure of I/O related software, including interrupt handers. An appreciation of the issues surrounding long running interrupt

More information

Chapter 10: Mass-Storage Systems

Chapter 10: Mass-Storage Systems COP 4610: Introduction to Operating Systems (Spring 2016) Chapter 10: Mass-Storage Systems Zhi Wang Florida State University Content Overview of Mass Storage Structure Disk Structure Disk Scheduling Disk

More information

Lecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown

Lecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown Lecture 21: Reliable, High Performance Storage CSC 469H1F Fall 2006 Angela Demke Brown 1 Review We ve looked at fault tolerance via server replication Continue operating with up to f failures Recovery

More information

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX Inventing Internet TV Available in more than 190 countries 104+ million subscribers Lots of Streaming == Lots of Traffic

More information

Network Design Considerations for Grid Computing

Network Design Considerations for Grid Computing Network Design Considerations for Grid Computing Engineering Systems How Bandwidth, Latency, and Packet Size Impact Grid Job Performance by Erik Burrows, Engineering Systems Analyst, Principal, Broadcom

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in

More information

Benchmarking Enterprise SSDs

Benchmarking Enterprise SSDs Whitepaper March 2013 Benchmarking Enterprise SSDs When properly structured, benchmark tests enable IT professionals to compare solid-state drives (SSDs) under test with conventional hard disk drives (HDDs)

More information

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide V7 Unified Asynchronous Replication Performance Reference Guide IBM V7 Unified R1.4.2 Asynchronous Replication Performance Reference Guide Document Version 1. SONAS / V7 Unified Asynchronous Replication

More information

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018 File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018 Today s Goals Supporting multiple file systems in one name space. Schedulers not just for CPUs, but disks too! Caching

More information

MPA (Marker PDU Aligned Framing for TCP)

MPA (Marker PDU Aligned Framing for TCP) MPA (Marker PDU Aligned Framing for TCP) draft-culley-iwarp-mpa-01 Paul R. Culley HP 11-18-2002 Marker (Protocol Data Unit) Aligned Framing, or MPA. 1 Motivation for MPA/DDP Enable Direct Data Placement

More information

Multiprocessor Systems. Chapter 8, 8.1

Multiprocessor Systems. Chapter 8, 8.1 Multiprocessor Systems Chapter 8, 8.1 1 Learning Outcomes An understanding of the structure and limits of multiprocessor hardware. An appreciation of approaches to operating system support for multiprocessor

More information

File Structures and Indexing

File Structures and Indexing File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures

More information

Department of Computer Engineering University of California at Santa Cruz. File Systems. Hai Tao

Department of Computer Engineering University of California at Santa Cruz. File Systems. Hai Tao File Systems Hai Tao File System File system is used to store sources, objects, libraries and executables, numeric data, text, video, audio, etc. The file system provide access and control function for

More information

Chapter-6. SUBJECT:- Operating System TOPICS:- I/O Management. Created by : - Sanjay Patel

Chapter-6. SUBJECT:- Operating System TOPICS:- I/O Management. Created by : - Sanjay Patel Chapter-6 SUBJECT:- Operating System TOPICS:- I/O Management Created by : - Sanjay Patel Disk Scheduling Algorithm 1) First-In-First-Out (FIFO) 2) Shortest Service Time First (SSTF) 3) SCAN 4) Circular-SCAN

More information

Lighting the Blue Touchpaper for UK e-science - Closing Conference of ESLEA Project The George Hotel, Edinburgh, UK March, 2007

Lighting the Blue Touchpaper for UK e-science - Closing Conference of ESLEA Project The George Hotel, Edinburgh, UK March, 2007 Working with 1 Gigabit Ethernet 1, The School of Physics and Astronomy, The University of Manchester, Manchester, M13 9PL UK E-mail: R.Hughes-Jones@manchester.ac.uk Stephen Kershaw The School of Physics

More information

File. File System Implementation. File Metadata. File System Implementation. Direct Memory Access Cont. Hardware background: Direct Memory Access

File. File System Implementation. File Metadata. File System Implementation. Direct Memory Access Cont. Hardware background: Direct Memory Access File File System Implementation Operating Systems Hebrew University Spring 2009 Sequence of bytes, with no structure as far as the operating system is concerned. The only operations are to read and write

More information

Test report of transparent local/remote application programming interface for storage elements

Test report of transparent local/remote application programming interface for storage elements NEXPReS is an Integrated Infrastructure Initiative (I3), funded under the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement n RI-261525. Test report of transparent

More information

Open-Channel SSDs Offer the Flexibility Required by Hyperscale Infrastructure Matias Bjørling CNEX Labs

Open-Channel SSDs Offer the Flexibility Required by Hyperscale Infrastructure Matias Bjørling CNEX Labs Open-Channel SSDs Offer the Flexibility Required by Hyperscale Infrastructure Matias Bjørling CNEX Labs 1 Public and Private Cloud Providers 2 Workloads and Applications Multi-Tenancy Databases Instance

More information

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8. Multiprocessor System Multiprocessor Systems Chapter 8, 8.1 We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than

More information

CLOUD-SCALE FILE SYSTEMS

CLOUD-SCALE FILE SYSTEMS Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients

More information

Disks and RAID. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, E. Sirer, R. Van Renesse]

Disks and RAID. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, E. Sirer, R. Van Renesse] Disks and RAID CS 4410 Operating Systems [R. Agarwal, L. Alvisi, A. Bracy, E. Sirer, R. Van Renesse] Storage Devices Magnetic disks Storage that rarely becomes corrupted Large capacity at low cost Block

More information

Input/Output. Today. Next. Principles of I/O hardware & software I/O software layers Disks. Protection & Security

Input/Output. Today. Next. Principles of I/O hardware & software I/O software layers Disks. Protection & Security Input/Output Today Principles of I/O hardware & software I/O software layers Disks Next Protection & Security Operating Systems and I/O Two key operating system goals Control I/O devices Provide a simple,

More information

Multiprocessor Systems. COMP s1

Multiprocessor Systems. COMP s1 Multiprocessor Systems 1 Multiprocessor System We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than one CPU to improve

More information

Definition of RAID Levels

Definition of RAID Levels RAID The basic idea of RAID (Redundant Array of Independent Disks) is to combine multiple inexpensive disk drives into an array of disk drives to obtain performance, capacity and reliability that exceeds

More information

Database Systems. November 2, 2011 Lecture #7. topobo (mit)

Database Systems. November 2, 2011 Lecture #7. topobo (mit) Database Systems November 2, 2011 Lecture #7 1 topobo (mit) 1 Announcement Assignment #2 due today Assignment #3 out today & due on 11/16. Midterm exam in class next week. Cover Chapters 1, 2,

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices Where Are We? COS 318: Operating Systems Storage Devices Jaswinder Pal Singh Computer Science Department Princeton University (http://www.cs.princeton.edu/courses/cos318/) u Covered: l Management of CPU

More information

VX3000-E Unified Network Storage

VX3000-E Unified Network Storage Datasheet VX3000-E Unified Network Storage Overview VX3000-E storage, with high performance, high reliability, high available, high density, high scalability and high usability, is a new-generation unified

More information

GFS: The Google File System

GFS: The Google File System GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one

More information

High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK

High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK [r.tasker@dl.ac.uk] DataTAG is a project sponsored by the European Commission - EU Grant IST-2001-32459

More information

STORAGE SYSTEMS. Operating Systems 2015 Spring by Euiseong Seo

STORAGE SYSTEMS. Operating Systems 2015 Spring by Euiseong Seo STORAGE SYSTEMS Operating Systems 2015 Spring by Euiseong Seo Today s Topics HDDs (Hard Disk Drives) Disk scheduling policies Linux I/O schedulers Secondary Storage Anything that is outside of primary

More information

JMR ELECTRONICS INC. WHITE PAPER

JMR ELECTRONICS INC. WHITE PAPER THE NEED FOR SPEED: USING PCI EXPRESS ATTACHED STORAGE FOREWORD The highest performance, expandable, directly attached storage can be achieved at low cost by moving the server or work station s PCI bus

More information

Vess A2000 Series. NVR Storage Appliance. Milestone Surveillance Solution. Version PROMISE Technology, Inc. All Rights Reserved.

Vess A2000 Series. NVR Storage Appliance. Milestone Surveillance Solution. Version PROMISE Technology, Inc. All Rights Reserved. Vess A2000 Series NVR Storage Appliance Milestone Surveillance Solution Version 1.0 2014 PROMISE Technology, Inc. All Rights Reserved. Contents Introduction 1 Overview 1 Purpose 2 Scope 2 Audience 2 Components

More information

Ch 11: Storage and File Structure

Ch 11: Storage and File Structure Ch 11: Storage and File Structure Overview of Physical Storage Media Magnetic Disks RAID Tertiary Storage Storage Access File Organization Organization of Records in Files Data-Dictionary Dictionary Storage

More information

Storage and File Structure. Classification of Physical Storage Media. Physical Storage Media. Physical Storage Media

Storage and File Structure. Classification of Physical Storage Media. Physical Storage Media. Physical Storage Media Storage and File Structure Classification of Physical Storage Media Overview of Physical Storage Media Magnetic Disks RAID Tertiary Storage Storage Access File Organization Organization of Records in Files

More information

CS3600 SYSTEMS AND NETWORKS

CS3600 SYSTEMS AND NETWORKS CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 9: Mass Storage Structure Prof. Alan Mislove (amislove@ccs.neu.edu) Moving-head Disk Mechanism 2 Overview of Mass Storage Structure Magnetic

More information

Implementing a Statically Adaptive Software RAID System

Implementing a Statically Adaptive Software RAID System Implementing a Statically Adaptive Software RAID System Matt McCormick mattmcc@cs.wisc.edu Master s Project Report Computer Sciences Department University of Wisconsin Madison Abstract Current RAID systems

More information

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Outline of Today s Lecture. The Big Picture: Where are We Now?

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Outline of Today s Lecture. The Big Picture: Where are We Now? CPS104 Computer Organization and Programming Lecture 18: Input-Output Robert Wagner cps 104.1 RW Fall 2000 Outline of Today s Lecture The system Magnetic Disk Tape es DMA cps 104.2 RW Fall 2000 The Big

More information

Flavors of Memory supported by Linux, their use and benefit. Christoph Lameter, Ph.D,

Flavors of Memory supported by Linux, their use and benefit. Christoph Lameter, Ph.D, Flavors of Memory supported by Linux, their use and benefit Christoph Lameter, Ph.D, Twitter: @qant Flavors Of Memory The term computer memory is a simple term but there are numerous nuances

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_WP_ 20121112 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD

More information

Red Hat Gluster Storage performance. Manoj Pillai and Ben England Performance Engineering June 25, 2015

Red Hat Gluster Storage performance. Manoj Pillai and Ben England Performance Engineering June 25, 2015 Red Hat Gluster Storage performance Manoj Pillai and Ben England Performance Engineering June 25, 2015 RDMA Erasure Coding NFS-Ganesha New or improved features (in last year) Snapshots SSD support Erasure

More information

I/O Management Software. Chapter 5

I/O Management Software. Chapter 5 I/O Management Software Chapter 5 1 Learning Outcomes An understanding of the structure of I/O related software, including interrupt handers. An appreciation of the issues surrounding long running interrupt

More information

Technical Note P/N REV A01 March 29, 2007

Technical Note P/N REV A01 March 29, 2007 EMC Symmetrix DMX-3 Best Practices Technical Note P/N 300-004-800 REV A01 March 29, 2007 This technical note contains information on these topics: Executive summary... 2 Introduction... 2 Tiered storage...

More information

NPTEL Course Jan K. Gopinath Indian Institute of Science

NPTEL Course Jan K. Gopinath Indian Institute of Science Storage Systems NPTEL Course Jan 2012 (Lecture 39) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,

More information

CSE 124: Networked Services Fall 2009 Lecture-19

CSE 124: Networked Services Fall 2009 Lecture-19 CSE 124: Networked Services Fall 2009 Lecture-19 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa09/cse124 Some of these slides are adapted from various sources/individuals including but

More information

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed

ASPERA HIGH-SPEED TRANSFER. Moving the world s data at maximum speed ASPERA HIGH-SPEED TRANSFER Moving the world s data at maximum speed ASPERA HIGH-SPEED FILE TRANSFER 80 GBIT/S OVER IP USING DPDK Performance, Code, and Architecture Charles Shiflett Developer of next-generation

More information

The Lion of storage systems

The Lion of storage systems The Lion of storage systems Rakuten. Inc, Yosuke Hara Mar 21, 2013 1 The Lion of storage systems http://www.leofs.org LeoFS v0.14.0 was released! 2 Table of Contents 1. Motivation 2. Overview & Inside

More information

ASN Configuration Best Practices

ASN Configuration Best Practices ASN Configuration Best Practices Managed machine Generally used CPUs and RAM amounts are enough for the managed machine: CPU still allows us to read and write data faster than real IO subsystem allows.

More information

Take control of storage performance

Take control of storage performance Take control of storage performance Transition From Speed To Management SSD + RAID 2008-2011 Reduce time to market Inherent bottlenecks Re-architect for better performance NVMe, SCSI Express Reads & Writes

More information

UC Santa Barbara. Operating Systems. Christopher Kruegel Department of Computer Science UC Santa Barbara

UC Santa Barbara. Operating Systems. Christopher Kruegel Department of Computer Science UC Santa Barbara Operating Systems Christopher Kruegel Department of Computer Science http://www.cs.ucsb.edu/~chris/ Input and Output Input/Output Devices The OS is responsible for managing I/O devices Issue requests Manage

More information

CSE 451: Operating Systems Winter Redundant Arrays of Inexpensive Disks (RAID) and OS structure. Gary Kimura

CSE 451: Operating Systems Winter Redundant Arrays of Inexpensive Disks (RAID) and OS structure. Gary Kimura CSE 451: Operating Systems Winter 2013 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Gary Kimura The challenge Disk transfer rates are improving, but much less fast than CPU performance

More information