CONTINUOUS MEDIA PLACEMENT AND SCHEDULING IN HETEROGENEOUS DISK STORAGE SYSTEMS. Roger Zimmermann

Size: px
Start display at page:

Download "CONTINUOUS MEDIA PLACEMENT AND SCHEDULING IN HETEROGENEOUS DISK STORAGE SYSTEMS. Roger Zimmermann"

Transcription

1 CONTINUOUS MEDIA PLACEMENT AND SCHEDULING IN HETEROGENEOUS DISK STORAGE SYSTEMS by Roger Zimmermann A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY ( Computer Science ) December 1998 Copyright 1998 Roger Zimmermann

2 i

3 Dedication To my parents Marlies and René Zimmermann Rätzer. and to Qing Nan (Adele) Chen. Without their love and support, this would not have been achieved. ii

4 Acknowledgments My friends and labmates at the University of Southern California have been the single best aspect of my graduate school life. I am deeply thankful to my advisor, Shahram Ghandeharizadeh, for making me a part of his tightly-knit research group. He not only provoked my interest into continuous media and storage systems initially, but spent countless hours to guide me through the challenges that followed. I am also grateful to the other members of my dissertation committee, Dennis McLeod and Tomlinson M. Holman, for their valuable insights and comments toward the improvement of this work. I have thoroughly enjoyed my personal and professional association with the members of the database group, including Jaber Al-Marri, Ali E. Dashti, Martha L. Escobar-Molano, Dongho Kim, Seon Ho Kim, Mohammad Reza Koladouzan, David Ta-Wei Li, Reza Rejaie, Cyrus Shahabi, Weifeng Shi, James Stone, Shimeng Wang, and Weifang Xie. When I first joined the database lab, a parallel, object-oriented database engine the Omega project was being implemented. Cyrus Shahabi patiently explained to me the substantial amount of Omega code and its inner workings. As part of this process he also introduced me to the intricacies of the lab infrastructure. David Ta-Wei Li and Dongho Kim were the main forces behind the administration of the hardware and software in the lab. I benefited tremendously from their insight into systems, electronics, and Unix programming. I was most fortunate to being involved in the Mitra project from the very beginning. Mitra, a continuous media server prototype, has provided me with a rich environment to learn, conjecture and verify new ideas. Initially, our research focused on the real-time capabilities of storage and file systems. Special thanks go to Douglas J. Ierardi for his theoretical work on hierarchical file systems. His insights have been a very stimulating and educational experience. I thoroughly enjoyed my collaboration with James Stone on all aspects of disk modeling and SCSI programming. Also instrumental in the Mitra efforts were Seon Ho Kim, David Ta-Wei Li, Reza Rejaie, and Weifeng Shi, the members that built the prototype. I am especially thankful for the many interesting and inspirational discussions that we had in this group during the past years. I am also grateful to my iii

5 officemates that became some of my closest friends. I will always cherish the memorable moments I had with Ali E. Dashti, Seon Ho Kim, Cyrus Shahabi, and Weifeng Shi. Thanks go to Hewlett-Packard for their generous support of the research at the USC database laboratory with an unrestricted cash/equipment gift, and to the National Science Foundation for their financial support 1. The Panasonic Information and Networking Technologies Laboratory provided additional resources. The staff of the Computer Science Department has helped me in many ways over the past years. I would like to thank Hilda Mauro, M. Vilma Lorenzana-Walsh, and Amy Yung. A very special thank you goes to Julieta DeLa Paz for her tremendous help and neverending patience when I had to order equipment, many times under tight deadlines. Her friendly nature and competent, efficient work style made for an unbeatable combination. Although distant, my family at home is the root of the strength and conviction that gets me through trying times. My parents, Marlies and René Zimmermann Rätzer, have always been supportive of my decisions, and they instilled in me the strength to pursue my goals. Gratitude also goes to my two sisters Doris and Irène and their families, who helped me to put it all in perspective and still enjoy life during my Ph.D. quest. 1 Grants IRI and IRI (NYI award). iv

6 Contents Dedication Acknowledgments List Of Tables List Of Figures Abstract ii iii vii viii x 1 Introduction Organization Previous Work Introduction Heterogeneous Storage Systems Multi-Zones Disk Drives Summary Fundamentals of CM Display and Magnetic Disk Drives Continuous Display Overview Target Environment Modern Disk Drives Disk Drive Modeling Low-Level SCSI Programming Interface Validation Summary Three Non-partitioning Techniques Disk Grouping Staggered Grouping Disk Merging Storage Space Considerations Configuration Planner for Disk Merging Operation Search Space Size v

7 4.4.3 Step Size p Planner Extension to p 0 < Validation Multi-Zone Disk Drives Summary Evaluation Tools and Methods An Analytical Comparison An Experimental Comparison Simulation Infrastructure A Partitioned versus a Non-Partitioned System Summary Fault Tolerance Introduction Target Architecture Overview of High Availability Techniques and Related Work Mirror-based Techniques Parity-based Techniques Issues Specific to Continuous Media Servers High Availability for Heterogeneous Storage Systems Mirror-based Techniques Parity-based Techniques Basic Reliability Modeling Non-overlapping Parity Summary Conclusions and Future Work Summary Directions for Future Research Appendix A Disk Characteristics Appendix B Hardness Results B.1 Introduction B.2 Q1 of the Fragment Combination Problem B.3 Q2 of the Fragment Combination Problem Reference List 122 vi

8 List Of Tables 1.1 A selection of commercially available continuous-media servers Parameters for three commercial disk drives Estimated parameters for disk drives of the near future Seek profile modeling parameters List of parameters used repeatedly in this thesis and their respective definitions Disk Grouping configuration example with three disk drive types Sample logical disk sizes for a Disk Merging system based on three disk drive types Planner Stage 1 output Planner Stage 2 output Two sample storage subsystem configurations with three or two different disk drive types employed Sample configuration planner search space sizes Planner Stage 1 output for p 0 = Parameters for the simulation and implementation Planner output used for validation Experimental parameters for a partitioned and a non-partitioned system Access distribution among subservers for partitioning ( configuration) Single disk reliability for three commercial disk drives List of terms used repeatedly in this chapter and their respective definitions. 82 A.1 Zoning information of a Seagate Hawk 1LP disk drive A.2 Zoning information of a Seagate Barracuda 4LP disk drive A.3 Zoning information of a Seagate Cheetah 4LP disk drive B.1 Matrix to visualize in how many ways fragments can be divided B.2 Precomputed multisets for F(f i,1) B.3 Algorithm execution times for three disk types vii

9 List Of Figures 1.1 Capacity improvement (Media) data rate improvement Seek, rotation, and access times improvement Cost per megabyte decline Storage requirement for a ninety minute digital video clip Striping Continuous display of multiple objects from a single disk Storage subsystem architecture Disk drive internals SCSI bus scalability Example measured and modeled seek profile Overhead as a function of the retrieval block size for three disk drive models Sample code fragment to translate logical into physical addresses on a disk Disk Grouping Mechanical positioning overhead for three different disk models Staggered Grouping Algorithm for dynamic computation of peak memory requirement Memory requirement for Staggered Grouping Disk Merging Sample system configurations for a Disk Merging system based on three disk drive types Configuration planner structure The two nested loops that define the configuration planner search space Configuration planner search space with three disk types Unused storage space with three disk types Cost per stream with three disk types Configuration planner search space with two disk types Unused storage space with two disk types Cost per stream with two disk types Disk Merging with a logical disk size larger than the smallest physical disk type Verification results for 20% load Verification results for 40% load Verification results for 60% load viii

10 4.20 Verification results for 80% load Cost per stream ( configuration) Startup latency range ( configuration) Cost per stream ( configuration) Startup latency range ( configuration) Cost per stream ( configuration) Startup latency range ( configuration) Example cost breakdown into memory and disk components ( ) Clip assignment and access distribution for the partitioned system Average startup latency for a partitioned system Average startup latency for a Disk Merging system Server utilization at 70% load for a partitioned system Server utilization at 70% load for a Disk Merging system Server utilization at 80% load for a partitioned system Server utilization at 80% load for a Disk Merging system Server utilization at 90% load for a partitioned system Server utilization at 90% load for a Disk Merging system Multi-node CM server architecture Classification of improved reliability techniques for CM servers Cyclic and striped retrieval Example of doubly striping Example of Disk Merging with a four node, six disk server Load distribution example with mirroring after a disk failure Load distribution example with mirroring after a node failure Example of parity group assignment for Disk Merging Load distribution example with parity-based fault tolerance after a node failure Markov model for a heterogeneous disk array Simplified Markov model for a heterogeneous disk array Variable bit-rate media approximation A.1 Seagate Hawk 1LP seek profile A.2 Seagate Hawk 1LP disk transfer rate of different zones A.3 Seagate Barracuda 4LP seek profile A.4 Seagate Barracuda 4LP disk transfer rate of different zones A.5 Seagate Cheetah 4LP seek profile A.6 Seagate Cheetah 4LP disk transfer rate of different zones B.1 Two possible configurations with Disk Merging B.2 Algorithm to check if a complete set of logical disks can be formed without dividing block fragments B.3 Simple graphical solution to Q2 of the Fragment Combination Problem ix

11 Abstract A number of recent technological trends have made data intensive applications such as continuous media (audio and video) servers a reality. These servers store and retrieve a large volume of data using magnetic disks. Servers consisting of heterogeneous disk drives have become a fact of life for several reasons. First, disks are mechanical devices that might fail. The failed disks are almost always replaced with new models. Second, the current technological trend for these devices is one of annual increase in both performance and storage capacity. Older disk models are discontinued because they cannot compete with the newer ones in the commercial arena. With a heterogeneous disk subsystem, the system should support continuous display while managing resources intelligently in order to maximize their utilization. This dissertation describes a taxonomy of techniques that ensure a continuous display of objects using a heterogeneous disk subsystem. This taxonomy consists of: (a) strategies that partition resources into homogeneous groups of disks and manage each independently, and (b) techniques that treat all disks uniformly, termed non-partitioning techniques. We introduce and evaluate three non-partitioning techniques: Disk Merging, Disk Grouping, and Staggered Grouping. Our results demonstrate that Disk Merging is the most flexible scheme while providing a competitive, low cost per simultaneous display. Finally, using an open simulation model, we compare Disk Merging with a partitioning technique. The obtained results confirm the superiority of Disk Merging. x

12 Chapter 1 Introduction The reasonable man adapts himself to the world. The unreasonable man tries to adapt the world to himself. Therefore all progress depends on the unreasonable man. George Bernard Shaw During the past few years it has become technically feasible to implement continuous media (CM) storage servers because of the increase in available computing power and advances in networking and data storage technologies. The exponential improvements in solid state technology (i.e., processors and memory) 1 as well as increased bandwidth and storage capacities of modern magnetic disk drives have allowed even personal computers to support audio and video clips. Table 1.1 shows a selection of commercially available CM server implementations [SMI96, SNI96, SCI95, ncu97]. Many applications that traditionally were the domain of analog video are evolving to utilize digital video. For example, terrestrial broadcasters in the U.S. will start to transmit some of their programs in digital form by the end of the year Satellite broadcast networks, such as DirecTV TM, were designed from the ground up with a completely digital infrastructure [BGR94]. The proliferation of digital audio and video has been facilitated by the wide acceptance of standards for compression and file formats, such as 2 MPEG-2 [Gal91]. Consumer electronics are also adopting these standards in products such as the 1 Popularly indicated by Moore s Law (The observation that the logic density of silicon integrated circuits has closely followed the curve (bits per square inch) = 2 (t 1962) where t is time in years; that is, the amount of information storable on a given amount of silicon has roughly doubled every year since the technology was invented.) 2 The Motion Picture Expert Group (MPEG) has standardized several video and audio compression formats. 1

13 digital versatile disk (DVD 3 ) and digital VHS 4 (D-VHS). Furthermore, increased network capacity for local area networks (LAN) and advanced streaming protocols (e.g., RTSP 5 ) allow remote viewing of video clips. In the future, the Internet may become a primary carrier of continuous media. These advances have produced interest in new applications such as digital libraries, video-on-demand or movie-on-demand services, distance training and learning, etc. Continuous media much exceed the resource demands of traditional data types and require massive amounts of space and bandwidth for their storage and display. The storage media of choice for such objects are usually magnetic disk drives because of their high performance and moderate cost. At the time of this writing, other storage technologies are either slow (magneto-optical discs), provide limited random access (tapes), or limited write capabilities (CD-ROM, DVD). The capacity and speed of magnetic disk drives has improved steadily over the last few decades. Table 1.2 shows the characteristics of three successive generations of disk drives from a commercial manufacturer. The storage capacity has roughly doubled every three years (a rate of approximately 26% per year) [PGK88]. More recently, the rate has accelerated to approximately 50% annually (see Figure 1.1). The disk transfer rate (i.e., the bandwidth) follows a similar trend with an annual improvement of approximately 40% (see Figure 1.2). The technological advances to reduce the seek time have been more moderate at a rate of five to ten percent per year (see Figure 1.3) [Gro97b, PGK88, RW94b]. At the same time the cost per megabyte has been declining steadily (see Figure 1.4). Table 1.3 shows the characteristics of hypothetical disks extrapolated from these technological trends. To achieve the high bandwidth and massive storage required for multi-user CM servers, disk drives are commonly combined into disk arrays to support many simultaneous I/O requests. Such a large-scale storage system suffers from two limitations that might introduce heterogeneity into its disk array. First, disks are mechanical devices that might fail. Because the technological trend for these devices is one of annual increase in both performance and storage capacity, it is usually more cost-effective to replace a failed disk with a new model. Moreover, in the fast-paced disk industry, the corresponding model may be 3 DVD is a standard for optical discs that features the same form-factor as CD-ROMs but holds up to 4.7 GB of data (8.5 GB with a dual-layer option). They can store video (MPEG-2), audio, or computer data (DVD-ROM). A writable version is planned for the near future [DVD96]. 4 Video Home System: a half-inch video cassette format. 5 The Real Time Streaming Protocol is an Internet Engineering Task Force (IETF) proposed standard for control of streaming media on the Internet. 2

14 Vendor Product Max. no. of users Max. bandwidth Starlight TM StarWorks-200M TM 1.5 Mb/s 200 Mb/s Sun TM MediaCenter TM 1000E 1.5 Mb/s 400 Mb/s Storage Concepts TM VIDEOPLEX TM 1.5 Mb/s 480 Mb/s a ncube TM MediaCUBE TM Mb/s 400 Mb/s ncube TM MediaCUBE TM Mb/s 1,650 Mb/s ncube TM MediaCUBE TM Mb/s 30,000 Mb/s a The VIDEOPLEX system does not transmit digital data over a network but uses analog VHS signals instead. Table 1.1: A selection of commercially available continuous-media servers. Model ST31200WD ST32171WD ST34501WD Series Hawk 1LP Barracuda 4LP Cheetah 4LP Manufacturer Seagate Technology TM, Inc. Capacity C GB GB GB Avg. transfer rate R D 3.47 MB/s 7.96 MB/s MB/s Spindle speed 5,400 rpm 7,200 rpm 10,033 rpm Avg. rotational latency 5.56 msec 4.17 msec 2.99 msec Worst case seek time 21 msec 19 msec 16 msec Surfaces Cylinders #cyl Number of Zones Z Sector size 512 bytes 512 bytes 512 bytes Sectors per Track ST ST Sector ratio Z0 106 ST ZN Introduction year Table 1.2: Parameters for three commercial disk drives. Model a D1998 D1999 D2000 Introduction year Capacity C 17 GB 23 GB 37 GB Avg. transfer rate R D 18 MB/s 25 MB/s 36 MB/s Spindle speed 12,000 rpm 12,000 rpm 15,000 rpm Avg. rotational latency 2.50 msec 2.50 msec 2.00 msec Worst case seek time 13 msec 11 msec 9 msec a These fictitious model names are used to reference the data sets of this table for the purpose of this thesis. Table 1.3: Estimated parameters for disk drives of the near future based on the projections from Figures 1.1 through

15 Capacity [MB / Disk] 100,000 10, % per year 1, % per year Production Year Figure 1.1: Capacity improvement [Pat93]. Data Rate [MB / Second] % per year Production Year Figure 1.2: (Media) data rate improvement [Gro97c]. 4

16 Time [msec] % per year Access time Rotational latency Seek time Production Year Figure 1.3: Seek, rotation, and access times improvement [Gro97b]. Cost [$ / MB] 1, % per year Production Year Figure 1.4: Cost/megabyte decline [Gro97a]. 5

17 Size [GB] MPEG-2 DV 4:1:1 DV 4:2:2 Digi Beta D1 HDTV 3 Mb/s 31 Mb/s 50 Mb/s 90 Mb/s 270 Mb/s 1.2 Gb/s Figure 1.5: Storage requirements for a ninety minute video clip digitized in different industry standard encoding formats. unavailable because it was discontinued by the manufacturer. Second, if the disk array needs to be expanded due to increased demand of either bandwidth or storage capacity, it is usually most cost-effective to add current-generation disks. For these reasons heterogeneous storage systems are a common occurrence. Hence, a CM server should manage these heterogeneous resources intelligently in order to maximize their utilization and to guarantee jitter-free video and audio displays. Before we explore the current techniques in support of CM servers with heterogeneous storage, we briefly introduce the requirements for the display of continuous media. Unlike traditional data types, such as records, text and even still images, continuous media objects are usually large in size. For example, a two hour MPEG-2 encoded movie requires approximately 4 gigabytes (GB) of storage (at a display rate of 3-15 megabits per second (Mb/s)). Figure 1.5 compares the space requirements for a one hour video clip encoded in different industry standard digital formats. Second, their isochronous nature requires timely, real-time retrieval of data blocks at a pre-specified rate. For example, the NTSC 6 video standard requires that 30 video frames per second are displayed to a viewer. If data blocks are delayed in their delivery to the decoding station then the display might suffer 6 National Television Standard Committee. 6

18 Time slot d d d d X 0 X 1 Time period X 2 X 0 X 1 X 2 X 3 X 4 X 5 X 6 X 7 Time X 3 d d d d Figure 1.6a: Data placement. X 4 Figure 1.6b: Data retrieval. Figure 1.6: Striping of object X in a storage system with four disks. Fixed data blocks are placed and retrieved in a round-robin manner. During the display time of a block (termed a time period) multiple block retrievals (each in its own time slot) are scheduled for each disk drive on behalf of different streams. from disruptions and delays. Digital continuous media streams can be encoded using either constant bit-rate (CBR) or variable bit-rate (VBR) schemes. As the name implies, the consumption rate of a CBR media stream is constant over time, while VBR streams use variable rates to achieve maximal compression. In this dissertation we will assume CBR media to provide a focused discussion. The extensions of our techniques to support VBR constitute one of our future research directions and are described in Chapter 7. To support the continuous display of a video object, for example X, from a disk array based storage server, X is commonly striped into n equi-sized blocks: X 0,X 1,...,X n 1 (see Figure 1.6a) [Pol91, TPBG93, CL93, BGMJ94, NY94a]. Both, the display time of a block and its transfer time from the disk are a fixed function of the display requirements of an object and the transfer rate of the disk, respectively. Using this information, the system stages a block of X (say X 0 ) from the disk into main memory and initiates its display. It schedules the disk drive to read X 1 into memory prior to the completion of the display of X 0. This ensures a smooth transition between the two blocks in support of continuous display. This process is repeated until all blocks of X have been retrieved and displayed. The periodic nature of the data retrieval and display process gives rise to the definition of a time period (T p ): it denotes the time required to display one data block. Note that the display time of a block is in general significantly longer than its transfer time from the disk drive (assuming a compressed video object). Thus, the bandwidth of a disk can be 7

19 multiplexed among several displays. Effectively, each time period is partitioned into slots which are guaranteed to be long enough to handle the retrieval of a single media block (see Figure1.6b) [LS93, VSR94, GZS + 97]. With a multi-disk architecture, the data blocks are assigned to the disks in a roundrobin manner to distribute the load imposed by a display evenly. With a homogeneous disk storage system, a display occupies one slot and migrates from one disk to the next at the end of each time period 7. This paradigm is no longer appropriate for a heterogeneous storage system, because fast disks can accommodate more slots per time period. However, since streams move in a round-robin manner from disk to disk, only the number of slots supported by the slowest participating disk drive can be allocated. Otherwise, some of the streams need to be abandoned during a transition from a fast to a slow disk. As a result, the fast disk drives will be idle for part of each time period, wasting bandwidth that could have been used to support additional displays. Only a few multi-disk designs and implementations to display continuous media from heterogeneous storage systems have been reported in the literature. They can broadly be classified into two groups: (1) designs that partition a heterogeneous storage system into multiple, homogeneous sub-servers, and (2) designs that present a virtual homogeneous view on top of the physical heterogeneity. We will consider the two groups in turn. By partitioning a heterogeneous storage system into a set of homogeneous subservers, each can be configured optimally, thus improving the disk bandwidth utilization. With this approach, each video object is striped only across the storage devices of the subserver on which it is placed. The access pattern to a collection of video or movie clips is usually quite skewed, for example, 20% of the clips may account for 80% of all retrieval requests [GKSZ97]. Because the subservers will have different performances (bandwidth and/or storage capacity) it becomes important to select the appropriate server on which to place each video clip so that the imposed load is balanced evenly. One metric that can be employed to make such placement decisions is the bandwidth to space ratio (BSR) of each subserver [DS95]. However, should the access pattern change over time then a subserver may become a bottleneck. Load-balancing algorithms are commonly used to counter such effects but they have the disadvantage of being detective rather than preventive, i.e., they can only try to remedy the situation after an imbalance or a bottleneck has already occurred. Replication can help to reduce the likelihood of bottlenecks but the 7 All requests (i.e., streams) that accessed disk d i to retrieve their current data block will advance to disk d (i+1)mod, where D is the total number of disk drives in the system, to retrieve the following block D during the next round. 8

20 replicated objects are wasting valuable storage space and hence the resulting system may not be cost-effective. The most relevant work has been proposed in a study that developed declustering algorithms to place data over all the disk drives present in a heterogeneous, multi-node storage system [TJH + 94, CRS95]. In its simplest form, this technique is based on the bandwidth ratio of the different devices leading to the concept of virtual disks. A variation of the technique considers capacity constraints as well. This scheme fails to address two issues. First, the presented algorithms assume a fixed bandwidth for each disk drive without considering the inevitable and variable mechanical positioning overhead. This may result in system configurations that are not necessarily optimal from a cost-effectiveness perspective. As demonstrated in Chapter 4, one may vary the block size to enhance the cost-effectiveness of a configuration. Second, large disk arrays that use striping across all their disks are especially vulnerable to failures. The risk of a disk failure increases proportionally with the number of drives. For example, if one disk has a mean time between failures (MTBF) of 500,000 hours (57 years) then, with a system that consists of 1,000 disks, a potential failure occurs every 500 hours (21 days), see Table 6.1 in Chapter 6. Because striping distributes each object across all disk drives, the failure of a single disk will affect all video clips, resulting in disruptions of service. This dissertation investigates continuous display with heterogeneous disk storage systems and proposes cost-effective techniques that improve upon traditional approaches. Its contributions are three-fold: 1. It combines data placement algorithms for heterogeneous storage systems with detailed disk modeling and scheduling techniques to provide guaranteed continuous media display. 2. It describes a configuration planner that takes the requirements of a CM application and the characteristics of a heterogeneous storage system as inputs and produces a set of parameters that most cost-effectively configures the system. 3. It provides a framework of fault tolerance techniques applicable to heterogeneous storage systems. We have evaluated the proposed techniques using analytical and simulation models to ensure their feasibility and performance. We have further verified the accuracy of the results by implementing them in the storage subsystem of Mitra (a CM server research prototype [GZS + 97]). 9

21 1.1 Organization The remainder of this thesis is organized as follows. Chapter 2 surveys previous and related work in the field of continuous media storage systems. Single disk, multi-disk, and heterogeneous storage systems are detailed. It is followed by a discussion of the fundamental principles of continuous media display in Chapter 3. Further contained in that chapter is an overview of the mechanical and electrical characteristics of modern magnetic disk drives and how the different aspects of such devices can effectively be modeled analytically. Chapter 4 introduces three non-partitioning techniques for heterogeneous CM storage systems: Disk Grouping, Staggered Grouping, and Disk Merging. It also features a configuration planner that finds the most cost-effective Disk Merging configuration for a given application. The first part of Chapter 5 provides an evaluation of the three techniques based on analytical results. In the second part one non-partitioning technique (Disk Merging) is compared to a simple partitioning scheme with a simulation model. A framework of fault tolerance techniques applicable to heterogeneous storage systems is proposed in Chapter 6. Conclusions and future work are outlined in Chapter 7. Some of the detailed parameters used for disk modeling are provided in Appendix A and Appendix B contains hardness results. 10

22 Chapter 2 Previous Work 2.1 Introduction A number of studies have investigated CM storage servers in recent years. They commonly focus on multi-disk architectures, because the number of CM streams that can be supported concurrently from a single disk is limited 1. To increase the number of streams, multiple disks (also referred to as disk arrays or disk farms) are commonly employed [TPBG93, LS93, LPS94, VSR94, Vin94, VRG95, GVK + 95, HLL + 95, ORS96b, MNO + 96, GZS + 97]. A possible multi-disk architecture based on the popular SCSI 2 I/O bus is illustrated in Figure 3.2. Two main issues need to be addressed when designing the storage system of a high-performance CM server: data placement and scheduling algorithms. A common technique to place CM data across multiple disks is striping (or some variation thereof, for example staggered striping [BGMJ94]). Such round-robin data placement has been widely reported to balance the load imposed on individual disk drives evenly, and furthermore, to support the maximum achievable throughput (e.g., [TPBG93, LS93, HLL + 95, VRG95, GZS + 97]). However, striping data across devices with varying storage capacities poses new challenges. The isochronous nature of CM streams requires that data blocks are retrieved in a periodic manner with real-time deadlines that cannot be missed for a smooth display. Numerous scheduling techniques have been developed, applicable at either the disk level to allow the multiplexing of the read/write heads of each disk on behalf of multiple displays (see Section for more details), or at the stream level to provide the round-robin 1 Disks with the highest available transfer bandwidth today (1998) can support approximately MPEG-2 streams at 3.5 Mb/s. 2 Small Computer System Interface. 11

23 movement across the disk drives. But again, these techniques need to be fine-tuned and adapted for the context of heterogeneous devices. 2.2 Heterogeneous Storage Systems Techniques designed for homogeneous storage systems assume a fixed number of slots per time period. In a heterogeneous storage environment fast (i.e., high bandwidth) disks will spend less time to complete a block transfer and accordingly accommodate more slots. However, since streams move in an ordered, round-robin manner from disk to disk, only the number of slots supported by the slowest participating disk drive can be allocated. As a result, the fast disk drives will be idle for a fraction of each time period, wasting part of their increased bandwidth that could have been used to support additional displays. The utilization can be improved by partitioning a heterogeneous storage system into a set of homogeneous subservers, each configured optimally. Every video object is striped across the storage devices of the subserver on which it is placed (a striping group). However, balancing the load across subservers becomes a challenge. The access pattern to a collection of video or movie clips is usually quite skewed, for example, references to 20% of the clips may account for 80% of all retrieval requests [GKSZ97]. Individual subservers will have different performances based upon the characteristics of the disk drives used (bandwidth and storage capacity). Therefore, it becomes significant to select the appropriate server on which to place each video clip so that the load is balanced evenly. One metric that can be employed to make such placement decisions is the bandwidth to space ratio (BSR) of each subserver [DS95]. However, should the access pattern change over time then a subserver may become a bottleneck. For some applications the access pattern can change quite unpredictably. A video-on-demand server, for example, may be subject to long-term changes because of external events. The release of a new movie may introduce some uncertainty as to how well it will be received and how quickly public interest will wane. Word-of-mouth may increase the popularity of a movie unexpectedly (a sleeper hit in Hollywood terms). Or the death of a well-known movie actor or actress may suddenly revitalize the public interest in movies that he or she participated in. In the short-term the access pattern during a twenty-four hour period may change due to a higher popularity of children-oriented programming in the afternoon and more requests for adult oriented material in the evening. Load-balancing algorithms are commonly used to counter such effects but they have the disadvantage of being detective rather than preventive, i.e., they can only try to remedy 12

24 an undesirable situation due to an imbalance once a bottleneck occurs. Furthermore, the overhead incurred by moving CM objects from one subserver to another may be significant. Recall that CM objects are usually large in size and that valuable bandwidth will be consumed during the transfer of data blocks from one subserver to another. Replication can help to reduce the likelihood of bottlenecks but the replicated objects are wasting scarce storage space and the resulting system may not be cost-effective. A technique that avoids these load-balancing issues altogether has been proposed in a study that developed declustering algorithms to place data across all the disk drives present in a heterogeneous, multi-node storage system [TJH + 94, CRS95]. The study describes a distributed parallel data storage architecture that is based on several server workstations connected through a high speed network. Each server maintains its own local secondary storage which consists of a variable number of magnetic disk drives. Therefore, individual servers deliver different streaming performance to the network and the end users. As a sample application, a single-user, high-bandwidth terrain visualization is presented which retrieves data in parallel at a very high data rate ( Mb/s) from participating servers. The study observes that this architecture could be used for multiple, lower-bandwidth streams, but no details are presented. The proposed technique is, in its simplest form, based on the bandwidth ratio of the different devices and the concept of virtual disks is introduced. A second scheme considers capacity constraints as well. This scheme fails to adequately address two issues. First, the presented algorithms assume a very simple disk model with a fixed bandwidth for each disk drive. The inevitable and variable mechanical positioning overhead (head seek times, rotational latency) are not considered. This simplification may necessitate a very conservative estimate of the transfer rates and hence result in system configurations that are not necessarily optimal from a costeffectiveness perspective. Second, large disk arrays that use striping across all their disks are especially vulnerable to failures. The risk of a disk failure increases proportionally with the number of drives. For example, if one disk has a mean time between failures (MTBF) of 500,000 hours (57 years) then, with a system that consists of 1,000 disks, a potential failure occurs every 500 hours (21 days), see Table 6.1 in Chapter 6. Because striping distributes each object across all disk drives, the failure of a single disk will affect all stored video clips, resulting in service disruptions. 13

25 2.3 Multi-Zones Disk Drives A number of studies have concentrated on data placement and disk head scheduling in multi-zone disk drives (e.g., [HMM93, Bir95, GKSZ96, GIKZ96, TKKD96]). Most of these techniques are either orthogonal to the techniques in this dissertation or they can be adapted to heterogeneous disk environments. Incorporating multi-zone schemes is attractive because they can increase a server s overall performance. For example, track-pairing enables a server to utilize close to the average transfer rate of a zoned disk as compared with the rate of the innermost (slowest) zone [Bir95]. This directly results in a higher number of supported streams. Section 4.6 details how a paradigm that is based on logical tracks [HMM93] can be combined with each of the three non-partitioning techniques introduced in Chapter Summary Most of the previous research in the field of continuous media storage servers has focused on homogeneous systems. However, these techniques are not suitable for heterogeneous storage environments. Two prior studies have investigated magnetic disk drive heterogeneity with either multiple partitions of homogeneous subservers [DS95] or the concept of uniform, logical disks [TJH + 94, CRS95]. The former suffers from difficult to address load-balancing aspects while the latter neglected to consider disk modeling details and fault tolerance issues. Therefore, new solutions remain desirable. 14

26 Chapter 3 Fundamentals of CM Display and Magnetic Disk Drives 3.1 Continuous Display Overview To support a continuous display of a video object, for example X, several studies have proposed to stripe X into n equi-sized blocks: X 0,X 1,...,X n 1 [Pol91, TPBG93, CL93, BGMJ94, NY94a]. Both, the display time of a block and its transfer time from the disk are a fixed function of the display requirements of an object and the transfer rate of the disk, respectively. Using this information, the system stages a block of X (say X 0 ) from the disk into main memory and initiates its display. It schedules the disk drive to read X 1 into memory prior to completion of the display of X 0. This ensures a smooth transition between the two blocks in order to support a continuous display. This process is repeated until all blocks of X have been retrieved and displayed. The system needs to estimate the disk service time in order to stage a block into memory in a timely manner to avoid starvation of data, i.e., hiccups. Section describes techniques to model the characteristics of disk drives to obtain the necessary service time estimates. The periodic nature of the data retrieval and display process gives rise to the definition of a time period (T p ): the time required to display a block, i.e., T p = B R C, where B denotes the size of each block X i and R C is the consumption rate of X. We are making the simplifying assumption in this thesis that the objects that constitute the video server database belong to a single media type and require a fixed bandwidth for their display. This assumption can be relaxed and the proposed techniques extended by employing various variable bit-rate techniques as surveyed in [AMG98]. Most multi-disk designs utilize striping to assign data blocks of each CM file to individual disks. With striping, a file is broken into (fixed) striping units which are assigned to the disks in a round-robin manner [SGM86]. There are two basic ways to retrieve striped data: (a) in parallel, to utilize the aggregate bandwidth of all the disks (this is typically done in 15

27 Mechanical Positioning Delays Disk Activity System Activity W X Display W Z k W X Z k+1 i j i+1 j+1 i+2 Display W i i+1 W Display X j Time Period T p Figure 3.1: Continuous display of multiple objects (W,X,...,Z) by multiplexing the bandwidth of a disk. RAID 1 systems), or (b) in a cyclic fashion to reduce the buffer requirements (this method is sometimes referred to as simple striping or RAID level 0). Both scheduling techniques can also be combined in a hierarchical fashion by forming several clusters of disks [GK95]. Data retrieval proceeds in parallel within a cluster and in cycles across the clusters. When identical disks are used, all the above techniques feature perfect load balancing during data retrieval and on average an equal amount of data is stored on every disk. Level 1 and above of RAID systems have only been analyzed for homogeneous disk arrays, since their performance depends critically on the slowest disk drive (parity information must be calculated from the data of all disks, and read or written for each I/O operation to complete [Fri96]). Note that the display time of a block is in general significantly longer than its transfer time from the disk drive (assuming a compressed video object). Thus, the bandwidth of a disk drive can be multiplexed among several displays referencing different objects (see Figure 3.1). However, a magnetic disk drive is a mechanical device. Multiplexing it among several displays causes it to incur mechanical positioning delays. The source of these delays is described in Section , which provides an overview of the internal operation of disk drives. Such overhead is wasteful and it reduces the number of simultaneous displays supported by the system 2. Section details how advanced scheduling policies can minimize the impact of these wasteful operations. 1 Redundant Arrays of Inexpensive Disks [PGK88]. 2 The disk performs useful work when it transfers data. 16

28 Memory CPU System Bus (.e.g., PCI) Initiator(s) SCSI ID 7 Host Adapter The host adapter links the system bus with the I/O bus. Storage/Disk Subsystem I/O Bus (e.g., SCSI) Target(s) Disk Disk Disk SCSI ID 0 SCSI ID 1 SCSI ID 15* *Note: Narrow SCSI supports 8 devices, while wide SCSI supports 16 devices. 3.2 Target Environment Figure 3.2: Storage subsystem architecture. An illustration of our target hardware platform is provided in Figure 3.2. The system bus is high performance with nanosecond latency and in excess of 100 MBytes per second transfer rate once the bus arbitration overhead is considered. It connects all the major components of the computer system: the memory, the CPU (central processing unit), and any attached devices, such as display, network, and storage subsystems. Within the storage subsystem each individual device, e.g., a disk or a tape, is attached to the I/O bus which in turn is connected to the system bus through a host adapter. The host adapter translates the I/O bus protocol into the system bus protocol and it may improve the performance of the overall system by providing caching and off-loading low-level functions from the main processor. The disk subsystem is central to this study and is detailed in the next section Modern Disk Drives The magnetic disk drive technology has benefited from more than two decades of research and development. It has evolved to provide a low latency (in the order of milliseconds) and a low cost per MByte of storage (a few cents per MByte at the time of this writing in 1998). It has become common place with annual sales in excess of 30 billion dollars [ost94]. Magnetic disk drives are commonly used for a wide variety of storage purposes in almost every computer system. To facilitate their integration and compatibility with a wide range of host hardware and operating systems, the interface that they present to the rest of the 17

29 Platter Spindle Read/Write Head Arm Arm Assembly (Actuator) Track Cylinder Sector Figure 3.3: Disk drive internals. system is well defined and hides a lot of the complexities of the actual internal operation. For example, the popular SCSI (Small Computer System Interface, see [ANS94, Ded94]) standard presents a magnetic disk drive to the host system as a linear vector of storage blocks (usually of size 512 bytes each). When an application requests the retrieval of one or several blocks the data will be returned after some (usually short) time but there is no explicit mechanism to inform the application exactly how long such an operation will take. In many circumstances such a best effort approach is reasonable because it simplifies program development by allowing the programmer to focus on the task at hand instead of the physical attributes of the disk drive. However, for a number of data intensive applications, for example continuous media servers, exact timing information is crucial to satisfy the real-time constraints imposed by the requirement for a jitter-free delivery of audio and video streams. Fortunately, with a model that imitates the internal operation of a magnetic disk drive it is possible to predict service times at the level of accuracy that is needed to design and configure CM server storage systems. Below, we will first give an overview of the internal operation of modern magnetic disk drives. Next, we will introduce a model that allows an estimation of the service time of a disk drive. This will build the basis for our introduction of techniques that provide CM services on top of heterogeneous storage systems. 18

Multimedia Storage Servers

Multimedia Storage Servers Multimedia Storage Servers Cyrus Shahabi shahabi@usc.edu Computer Science Department University of Southern California Los Angeles CA, 90089-0781 http://infolab.usc.edu 1 OUTLINE Introduction Continuous

More information

I/O CANNOT BE IGNORED

I/O CANNOT BE IGNORED LECTURE 13 I/O I/O CANNOT BE IGNORED Assume a program requires 100 seconds, 90 seconds for main memory, 10 seconds for I/O. Assume main memory access improves by ~10% per year and I/O remains the same.

More information

Mitra: A Scalable Continuous Media Server

Mitra: A Scalable Continuous Media Server Mitra: A Scalable Continuous Media Server Shahram Ghandeharizadeh, Roger Zimmermann, Weifeng Shi, Reza Rejaie, Doug Ierardi, Ta-Wei Li Computer Science Department University of Southern California Los

More information

I/O CANNOT BE IGNORED

I/O CANNOT BE IGNORED LECTURE 13 I/O I/O CANNOT BE IGNORED Assume a program requires 100 seconds, 90 seconds for main memory, 10 seconds for I/O. Assume main memory access improves by ~10% per year and I/O remains the same.

More information

Department of Computer Engineering University of California at Santa Cruz. File Systems. Hai Tao

Department of Computer Engineering University of California at Santa Cruz. File Systems. Hai Tao File Systems Hai Tao File System File system is used to store sources, objects, libraries and executables, numeric data, text, video, audio, etc. The file system provide access and control function for

More information

Performance of relational database management

Performance of relational database management Building a 3-D DRAM Architecture for Optimum Cost/Performance By Gene Bowles and Duke Lambert As systems increase in performance and power, magnetic disk storage speeds have lagged behind. But using solidstate

More information

Database Management Systems, 2nd edition, Raghu Ramakrishnan, Johannes Gehrke, McGraw-Hill

Database Management Systems, 2nd edition, Raghu Ramakrishnan, Johannes Gehrke, McGraw-Hill Lecture Handout Database Management System Lecture No. 34 Reading Material Database Management Systems, 2nd edition, Raghu Ramakrishnan, Johannes Gehrke, McGraw-Hill Modern Database Management, Fred McFadden,

More information

Chapter 11. I/O Management and Disk Scheduling

Chapter 11. I/O Management and Disk Scheduling Operating System Chapter 11. I/O Management and Disk Scheduling Lynn Choi School of Electrical Engineering Categories of I/O Devices I/O devices can be grouped into 3 categories Human readable devices

More information

Chapter 10: Mass-Storage Systems

Chapter 10: Mass-Storage Systems COP 4610: Introduction to Operating Systems (Spring 2016) Chapter 10: Mass-Storage Systems Zhi Wang Florida State University Content Overview of Mass Storage Structure Disk Structure Disk Scheduling Disk

More information

Storage Systems. Storage Systems

Storage Systems. Storage Systems Storage Systems Storage Systems We already know about four levels of storage: Registers Cache Memory Disk But we've been a little vague on how these devices are interconnected In this unit, we study Input/output

More information

Computer Organization and Technology External Memory

Computer Organization and Technology External Memory Computer Organization and Technology External Memory Assoc. Prof. Dr. Wattanapong Kurdthongmee Division of Computer Engineering, School of Engineering and Resources, Walailak University 1 Magnetic Disk

More information

UNIT 2 Data Center Environment

UNIT 2 Data Center Environment UNIT 2 Data Center Environment This chapter provides an understanding of various logical components of hosts such as file systems, volume managers, and operating systems, and their role in the storage

More information

Storage Devices for Database Systems

Storage Devices for Database Systems Storage Devices for Database Systems 5DV120 Database System Principles Umeå University Department of Computing Science Stephen J. Hegner hegner@cs.umu.se http://www.cs.umu.se/~hegner Storage Devices for

More information

Computer Architecture 计算机体系结构. Lecture 6. Data Storage and I/O 第六讲 数据存储和输入输出. Chao Li, PhD. 李超博士

Computer Architecture 计算机体系结构. Lecture 6. Data Storage and I/O 第六讲 数据存储和输入输出. Chao Li, PhD. 李超博士 Computer Architecture 计算机体系结构 Lecture 6. Data Storage and I/O 第六讲 数据存储和输入输出 Chao Li, PhD. 李超博士 SJTU-SE346, Spring 2018 Review Memory hierarchy Cache and virtual memory Locality principle Miss cache, victim

More information

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 420, York College. November 21, 2006 November 21, 2006 The memory hierarchy Red = Level Access time Capacity Features Registers nanoseconds 100s of bytes fixed Cache nanoseconds 1-2 MB fixed RAM nanoseconds MBs to GBs expandable Disk milliseconds

More information

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University I/O, Disks, and RAID Yi Shi Fall 2017 Xi an Jiaotong University Goals for Today Disks How does a computer system permanently store data? RAID How to make storage both efficient and reliable? 2 What does

More information

V. Mass Storage Systems

V. Mass Storage Systems TDIU25: Operating Systems V. Mass Storage Systems SGG9: chapter 12 o Mass storage: Hard disks, structure, scheduling, RAID Copyright Notice: The lecture notes are mainly based on modifications of the slides

More information

CSE 451: Operating Systems Spring Module 12 Secondary Storage. Steve Gribble

CSE 451: Operating Systems Spring Module 12 Secondary Storage. Steve Gribble CSE 451: Operating Systems Spring 2009 Module 12 Secondary Storage Steve Gribble Secondary storage Secondary storage typically: is anything that is outside of primary memory does not permit direct execution

More information

UC Santa Barbara. Operating Systems. Christopher Kruegel Department of Computer Science UC Santa Barbara

UC Santa Barbara. Operating Systems. Christopher Kruegel Department of Computer Science UC Santa Barbara Operating Systems Christopher Kruegel Department of Computer Science http://www.cs.ucsb.edu/~chris/ Input and Output Input/Output Devices The OS is responsible for managing I/O devices Issue requests Manage

More information

Table 6.1 Physical Characteristics of Disk Systems

Table 6.1 Physical Characteristics of Disk Systems Table 6.1 Physical Characteristics of Disk Systems Head Motion Fixed head (one per track) Movable head (one per surface) Disk Portability Nonremovable disk Removable disk Sides Single sided Double sided

More information

HERA: Heterogeneous Extension of RAID

HERA: Heterogeneous Extension of RAID Copyright CSREA Press. Published in the Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA ), June 6-9,, Las Vegas, Nevada. HERA: Heterogeneous

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 6 External Memory

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 6 External Memory William Stallings Computer Organization and Architecture 8 th Edition Chapter 6 External Memory Types of External Memory Magnetic Disk RAID Removable Optical CD-ROM CD-Recordable (CD-R) CD-R/W DVD Magnetic

More information

Session: Hardware Topic: Disks. Daniel Chang. COP 3502 Introduction to Computer Science. Lecture. Copyright August 2004, Daniel Chang

Session: Hardware Topic: Disks. Daniel Chang. COP 3502 Introduction to Computer Science. Lecture. Copyright August 2004, Daniel Chang Lecture Session: Hardware Topic: Disks Daniel Chang Basic Components CPU I/O Devices RAM Operating System Disks Considered I/O devices Used to hold data and programs before they are loaded to memory and

More information

Page 1. Magnetic Disk Purpose Long term, nonvolatile storage Lowest level in the memory hierarchy. Typical Disk Access Time

Page 1. Magnetic Disk Purpose Long term, nonvolatile storage Lowest level in the memory hierarchy. Typical Disk Access Time Review: Major Components of a Computer Processor Control Datapath Cache Memory Main Memory Secondary Memory (Disk) Devices Output Input Magnetic Disk Purpose Long term, nonvolatile storage Lowest level

More information

Storage System COSC UCB

Storage System COSC UCB Storage System COSC4201 1 1999 UCB I/O and Disks Over the years much less attention was paid to I/O compared with CPU design. As frustrating as a CPU crash is, disk crash is a lot worse. Disks are mechanical

More information

CSE 451: Operating Systems Spring Module 12 Secondary Storage

CSE 451: Operating Systems Spring Module 12 Secondary Storage CSE 451: Operating Systems Spring 2017 Module 12 Secondary Storage John Zahorjan 1 Secondary storage Secondary storage typically: is anything that is outside of primary memory does not permit direct execution

More information

Chapter 13: Mass-Storage Systems. Disk Scheduling. Disk Scheduling (Cont.) Disk Structure FCFS. Moving-Head Disk Mechanism

Chapter 13: Mass-Storage Systems. Disk Scheduling. Disk Scheduling (Cont.) Disk Structure FCFS. Moving-Head Disk Mechanism Chapter 13: Mass-Storage Systems Disk Scheduling Disk Structure Disk Scheduling Disk Management Swap-Space Management RAID Structure Disk Attachment Stable-Storage Implementation Tertiary Storage Devices

More information

Chapter 13: Mass-Storage Systems. Disk Structure

Chapter 13: Mass-Storage Systems. Disk Structure Chapter 13: Mass-Storage Systems Disk Structure Disk Scheduling Disk Management Swap-Space Management RAID Structure Disk Attachment Stable-Storage Implementation Tertiary Storage Devices Operating System

More information

Silberschatz, et al. Topics based on Chapter 13

Silberschatz, et al. Topics based on Chapter 13 Silberschatz, et al. Topics based on Chapter 13 Mass Storage Structure CPSC 410--Richard Furuta 3/23/00 1 Mass Storage Topics Secondary storage structure Disk Structure Disk Scheduling Disk Management

More information

Database Systems II. Secondary Storage

Database Systems II. Secondary Storage Database Systems II Secondary Storage CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 29 The Memory Hierarchy Swapping, Main-memory DBMS s Tertiary Storage: Tape, Network Backup 3,200 MB/s (DDR-SDRAM

More information

File. File System Implementation. Operations. Permissions and Data Layout. Storing and Accessing File Data. Opening a File

File. File System Implementation. Operations. Permissions and Data Layout. Storing and Accessing File Data. Opening a File File File System Implementation Operating Systems Hebrew University Spring 2007 Sequence of bytes, with no structure as far as the operating system is concerned. The only operations are to read and write

More information

Introduction to I/O. April 30, Howard Huang 1

Introduction to I/O. April 30, Howard Huang 1 Introduction to I/O Where does the data for our CPU and memory come from or go to? Computers communicate with the outside world via I/O devices. Input devices supply computers with data to operate on.

More information

Distributed Video Systems Chapter 5 Issues in Video Storage and Retrieval Part I - The Single-Disk Case

Distributed Video Systems Chapter 5 Issues in Video Storage and Retrieval Part I - The Single-Disk Case Distributed Video Systems Chapter 5 Issues in Video Storage and Retrieval Part I - he Single-Disk Case Jack Yiu-bun Lee Department of Information Engineering he Chinese University of Hong Kong Contents

More information

Storage. CS 3410 Computer System Organization & Programming

Storage. CS 3410 Computer System Organization & Programming Storage CS 3410 Computer System Organization & Programming These slides are the product of many rounds of teaching CS 3410 by Deniz Altinbuke, Kevin Walsh, and Professors Weatherspoon, Bala, Bracy, and

More information

Introduction to I/O and Disk Management

Introduction to I/O and Disk Management 1 Secondary Storage Management Disks just like memory, only different Introduction to I/O and Disk Management Why have disks? Ø Memory is small. Disks are large. Short term storage for memory contents

More information

Disk scheduling Disk reliability Tertiary storage Swap space management Linux swap space management

Disk scheduling Disk reliability Tertiary storage Swap space management Linux swap space management Lecture Overview Mass storage devices Disk scheduling Disk reliability Tertiary storage Swap space management Linux swap space management Operating Systems - June 28, 2001 Disk Structure Disk drives are

More information

COMP283-Lecture 3 Applied Database Management

COMP283-Lecture 3 Applied Database Management COMP283-Lecture 3 Applied Database Management Introduction DB Design Continued Disk Sizing Disk Types & Controllers DB Capacity 1 COMP283-Lecture 3 DB Storage: Linear Growth Disk space requirements increases

More information

Introduction to I/O and Disk Management

Introduction to I/O and Disk Management Introduction to I/O and Disk Management 1 Secondary Storage Management Disks just like memory, only different Why have disks? Ø Memory is small. Disks are large. Short term storage for memory contents

More information

Administrivia. CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Disks (cont.) Disks - review

Administrivia. CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Disks (cont.) Disks - review Administrivia CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Homework #4 due Thursday answers posted soon after Exam #2 on Thursday, April 24 on memory hierarchy (Unit 4) and

More information

Distributed Video Systems Chapter 3 Storage Technologies

Distributed Video Systems Chapter 3 Storage Technologies Distributed Video Systems Chapter 3 Storage Technologies Jack Yiu-bun Lee Department of Information Engineering The Chinese University of Hong Kong Contents 3.1 Introduction 3.2 Magnetic Disks 3.3 Video

More information

CSCI-GA Operating Systems. I/O : Disk Scheduling and RAID. Hubertus Franke

CSCI-GA Operating Systems. I/O : Disk Scheduling and RAID. Hubertus Franke CSCI-GA.2250-001 Operating Systems I/O : Disk Scheduling and RAID Hubertus Franke frankeh@cs.nyu.edu Disks Scheduling Abstracted by OS as files A Conventional Hard Disk (Magnetic) Structure Hard Disk

More information

UNIT 4 Device Management

UNIT 4 Device Management UNIT 4 Device Management (A) Device Function. (B) Device Characteristic. (C) Disk space Management. (D) Allocation and Disk scheduling Methods. [4.1] Device Management Functions The management of I/O devices

More information

Chapter 9: Peripheral Devices: Magnetic Disks

Chapter 9: Peripheral Devices: Magnetic Disks Chapter 9: Peripheral Devices: Magnetic Disks Basic Disk Operation Performance Parameters and History of Improvement Example disks RAID (Redundant Arrays of Inexpensive Disks) Improving Reliability Improving

More information

Mass-Storage Structure

Mass-Storage Structure CS 4410 Operating Systems Mass-Storage Structure Summer 2011 Cornell University 1 Today How is data saved in the hard disk? Magnetic disk Disk speed parameters Disk Scheduling RAID Structure 2 Secondary

More information

Mass-Storage Structure

Mass-Storage Structure Operating Systems (Fall/Winter 2018) Mass-Storage Structure Yajin Zhou (http://yajin.org) Zhejiang University Acknowledgement: some pages are based on the slides from Zhi Wang(fsu). Review On-disk structure

More information

Input/Output. Today. Next. Principles of I/O hardware & software I/O software layers Disks. Protection & Security

Input/Output. Today. Next. Principles of I/O hardware & software I/O software layers Disks. Protection & Security Input/Output Today Principles of I/O hardware & software I/O software layers Disks Next Protection & Security Operating Systems and I/O Two key operating system goals Control I/O devices Provide a simple,

More information

A track on a magnetic disk is a concentric rings where data is stored.

A track on a magnetic disk is a concentric rings where data is stored. CS 320 Ch 6 External Memory Figure 6.1 shows a typical read/ head on a magnetic disk system. Read and heads separate. Read head uses a material that changes resistance in response to a magnetic field.

More information

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision

More information

Randomized Data Allocation in Scalable Streaming Architectures

Randomized Data Allocation in Scalable Streaming Architectures Randomized Data Allocation in Scalable Streaming Architectures Kun Fu and Roger Zimmermann Integrated Media Systems Center University of Southern California Los Angeles, California 989 [kunfu, rzimmerm]@usc.edu

More information

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University l Chapter 10: File System l Chapter 11: Implementing File-Systems l Chapter 12: Mass-Storage

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University

COS 318: Operating Systems. Storage Devices. Kai Li Computer Science Department Princeton University COS 318: Operating Systems Storage Devices Kai Li Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall11/cos318/ Today s Topics Magnetic disks Magnetic disk

More information

CS3600 SYSTEMS AND NETWORKS

CS3600 SYSTEMS AND NETWORKS CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 9: Mass Storage Structure Prof. Alan Mislove (amislove@ccs.neu.edu) Moving-head Disk Mechanism 2 Overview of Mass Storage Structure Magnetic

More information

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition Chapter 10: Mass-Storage Systems Silberschatz, Galvin and Gagne 2013 Objectives To describe the physical structure of secondary storage devices and its effects on the uses of the devices To explain the

More information

BBM371- Data Management. Lecture 2: Storage Devices

BBM371- Data Management. Lecture 2: Storage Devices BBM371- Data Management Lecture 2: Storage Devices 18.10.2018 Memory Hierarchy cache Main memory disk Optical storage Tapes V NV Traveling the hierarchy: 1. speed ( higher=faster) 2. cost (lower=cheaper)

More information

High-Performance Storage Systems

High-Performance Storage Systems High-Performance Storage Systems I/O Systems Processor interrupts Cache Memory - I/O Bus Main Memory I/O Controller I/O Controller I/O Controller Disk Disk Graphics Network 2 Storage Technology Drivers

More information

Tape pictures. CSE 30341: Operating Systems Principles

Tape pictures. CSE 30341: Operating Systems Principles Tape pictures 4/11/07 CSE 30341: Operating Systems Principles page 1 Tape Drives The basic operations for a tape drive differ from those of a disk drive. locate positions the tape to a specific logical

More information

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID 1 Virtual Channel Flow Control Each switch has multiple virtual channels per phys. channel Each virtual

More information

Storage Update and Storage Best Practices for Microsoft Server Applications. Dennis Martin President, Demartek January 2009 Copyright 2009 Demartek

Storage Update and Storage Best Practices for Microsoft Server Applications. Dennis Martin President, Demartek January 2009 Copyright 2009 Demartek Storage Update and Storage Best Practices for Microsoft Server Applications Dennis Martin President, Demartek January 2009 Copyright 2009 Demartek Agenda Introduction Storage Technologies Storage Devices

More information

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili Virtual Memory Lecture notes from MKP and S. Yalamanchili Sections 5.4, 5.5, 5.6, 5.8, 5.10 Reading (2) 1 The Memory Hierarchy ALU registers Cache Memory Memory Memory Managed by the compiler Memory Managed

More information

STORAGE SYSTEMS. Operating Systems 2015 Spring by Euiseong Seo

STORAGE SYSTEMS. Operating Systems 2015 Spring by Euiseong Seo STORAGE SYSTEMS Operating Systems 2015 Spring by Euiseong Seo Today s Topics HDDs (Hard Disk Drives) Disk scheduling policies Linux I/O schedulers Secondary Storage Anything that is outside of primary

More information

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE RAID SEMINAR REPORT 2004 Submitted on: Submitted by: 24/09/2004 Asha.P.M NO: 612 S7 ECE CONTENTS 1. Introduction 1 2. The array and RAID controller concept 2 2.1. Mirroring 3 2.2. Parity 5 2.3. Error correcting

More information

1 of 6 4/8/2011 4:08 PM Electronic Hardware Information, Guides and Tools search newsletter subscribe Home Utilities Downloads Links Info Ads by Google Raid Hard Drives Raid Raid Data Recovery SSD in Raid

More information

High Performance Computing Course Notes High Performance Storage

High Performance Computing Course Notes High Performance Storage High Performance Computing Course Notes 2008-2009 2009 High Performance Storage Storage devices Primary storage: register (1 CPU cycle, a few ns) Cache (10-200 cycles, 0.02-0.5us) Main memory Local main

More information

Chapter 12: Mass-Storage

Chapter 12: Mass-Storage Chapter 12: Mass-Storage Systems Chapter 12: Mass-Storage Systems Revised 2010. Tao Yang Overview of Mass Storage Structure Disk Structure Disk Attachment Disk Scheduling Disk Management Swap-Space Management

More information

Chapter 14: Mass-Storage Systems

Chapter 14: Mass-Storage Systems Chapter 14: Mass-Storage Systems Disk Structure Disk Scheduling Disk Management Swap-Space Management RAID Structure Disk Attachment Stable-Storage Implementation Tertiary Storage Devices Operating System

More information

Chapter 10: Mass-Storage Systems

Chapter 10: Mass-Storage Systems Chapter 10: Mass-Storage Systems Silberschatz, Galvin and Gagne 2013 Chapter 10: Mass-Storage Systems Overview of Mass Storage Structure Disk Structure Disk Attachment Disk Scheduling Disk Management Swap-Space

More information

MEMS-based Disk Buffer for Streaming Media Servers

MEMS-based Disk Buffer for Streaming Media Servers MEMS-based Disk Buffer for Streaming Media Servers Raju Rangaswami Zoran Dimitrijević Edward Chang Klaus E. Schauser University of California, Santa Barbara raju@cs.ucsb.edu zoran@cs.ucsb.edu echang@ece.ucsb.edu

More information

Today: Secondary Storage! Typical Disk Parameters!

Today: Secondary Storage! Typical Disk Parameters! Today: Secondary Storage! To read or write a disk block: Seek: (latency) position head over a track/cylinder. The seek time depends on how fast the hardware moves the arm. Rotational delay: (latency) time

More information

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition

Chapter 10: Mass-Storage Systems. Operating System Concepts 9 th Edition Chapter 10: Mass-Storage Systems Silberschatz, Galvin and Gagne 2013 Chapter 10: Mass-Storage Systems Overview of Mass Storage Structure Disk Structure Disk Attachment Disk Scheduling Disk Management Swap-Space

More information

CSE 153 Design of Operating Systems

CSE 153 Design of Operating Systems CSE 153 Design of Operating Systems Winter 2018 Lecture 20: File Systems (1) Disk drives OS Abstractions Applications Process File system Virtual memory Operating System CPU Hardware Disk RAM CSE 153 Lecture

More information

William Stallings Computer Organization and Architecture 6 th Edition. Chapter 6 External Memory

William Stallings Computer Organization and Architecture 6 th Edition. Chapter 6 External Memory William Stallings Computer Organization and Architecture 6 th Edition Chapter 6 External Memory Types of External Memory Magnetic Disk RAID Removable Optical CD-ROM CD-Recordable (CD-R) CD-R/W DVD Magnetic

More information

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Outline of Today s Lecture. The Big Picture: Where are We Now?

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Outline of Today s Lecture. The Big Picture: Where are We Now? CPS104 Computer Organization and Programming Lecture 18: Input-Output Robert Wagner cps 104.1 RW Fall 2000 Outline of Today s Lecture The system Magnetic Disk Tape es DMA cps 104.2 RW Fall 2000 The Big

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Part 6 Input/Output Israel Koren ECE568/Koren Part.6. Motivation: Why Care About I/O? CPU Performance:

More information

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices

u Covered: l Management of CPU & concurrency l Management of main memory & virtual memory u Currently --- Management of I/O devices Where Are We? COS 318: Operating Systems Storage Devices Jaswinder Pal Singh Computer Science Department Princeton University (http://www.cs.princeton.edu/courses/cos318/) u Covered: l Management of CPU

More information

Magnetic Disk. Optical. Magnetic Tape. RAID Removable. CD-ROM CD-Recordable (CD-R) CD-R/W DVD

Magnetic Disk. Optical. Magnetic Tape. RAID Removable. CD-ROM CD-Recordable (CD-R) CD-R/W DVD External Memory Magnetic Disk RAID Removable Optical CD-ROM CD-Recordable (CD-R) CD-R/W DVD Magnetic Tape Disk substrate coated with magnetizable material (iron oxide rust) Substrate used to be aluminium

More information

CHAPTER 12: MASS-STORAGE SYSTEMS (A) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

CHAPTER 12: MASS-STORAGE SYSTEMS (A) By I-Chen Lin Textbook: Operating System Concepts 9th Ed. CHAPTER 12: MASS-STORAGE SYSTEMS (A) By I-Chen Lin Textbook: Operating System Concepts 9th Ed. Chapter 12: Mass-Storage Systems Overview of Mass-Storage Structure Disk Structure Disk Attachment Disk Scheduling

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Storage and Other I/O Topics I/O Performance Measures Types and Characteristics of I/O Devices Buses Interfacing I/O Devices

More information

Module 13: Secondary-Storage

Module 13: Secondary-Storage Module 13: Secondary-Storage Disk Structure Disk Scheduling Disk Management Swap-Space Management Disk Reliability Stable-Storage Implementation Tertiary Storage Devices Operating System Issues Performance

More information

Data Storage - I: Memory Hierarchies & Disks. Contains slides from: Naci Akkök, Pål Halvorsen, Hector Garcia-Molina, Ketil Lund, Vera Goebel

Data Storage - I: Memory Hierarchies & Disks. Contains slides from: Naci Akkök, Pål Halvorsen, Hector Garcia-Molina, Ketil Lund, Vera Goebel Data Storage - I: Memory Hierarchies & Disks Contains slides from: Naci Akkök, Pål Halvorsen, Hector Garcia-Molina, Ketil Lund, Vera Goebel Overview Implementing a DBS is easy!!?? Memory hierarchies caches

More information

CMSC 424 Database design Lecture 12 Storage. Mihai Pop

CMSC 424 Database design Lecture 12 Storage. Mihai Pop CMSC 424 Database design Lecture 12 Storage Mihai Pop Administrative Office hours tomorrow @ 10 Midterms are in solutions for part C will be posted later this week Project partners I have an odd number

More information

Computer System Architecture

Computer System Architecture CSC 203 1.5 Computer System Architecture Department of Statistics and Computer Science University of Sri Jayewardenepura Secondary Memory 2 Technologies Magnetic storage Floppy, Zip disk, Hard drives,

More information

Input/Output Management

Input/Output Management Chapter 11 Input/Output Management This could be the messiest aspect of an operating system. There are just too much stuff involved, it is difficult to develop a uniform and consistent theory to cover

More information

Assessing performance in HP LeftHand SANs

Assessing performance in HP LeftHand SANs Assessing performance in HP LeftHand SANs HP LeftHand Starter, Virtualization, and Multi-Site SANs deliver reliable, scalable, and predictable performance White paper Introduction... 2 The advantages of

More information

5 Computer Organization

5 Computer Organization 5 Computer Organization 5.1 Foundations of Computer Science ã Cengage Learning Objectives After studying this chapter, the student should be able to: q List the three subsystems of a computer. q Describe

More information

SYSTEM UPGRADE, INC Making Good Computers Better. System Upgrade Teaches RAID

SYSTEM UPGRADE, INC Making Good Computers Better. System Upgrade Teaches RAID System Upgrade Teaches RAID In the growing computer industry we often find it difficult to keep track of the everyday changes in technology. At System Upgrade, Inc it is our goal and mission to provide

More information

CSE 451: Operating Systems Winter Redundant Arrays of Inexpensive Disks (RAID) and OS structure. Gary Kimura

CSE 451: Operating Systems Winter Redundant Arrays of Inexpensive Disks (RAID) and OS structure. Gary Kimura CSE 451: Operating Systems Winter 2013 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Gary Kimura The challenge Disk transfer rates are improving, but much less fast than CPU performance

More information

Mass-Storage. ICS332 - Fall 2017 Operating Systems. Henri Casanova

Mass-Storage. ICS332 - Fall 2017 Operating Systems. Henri Casanova Mass-Storage ICS332 - Fall 2017 Operating Systems Henri Casanova (henric@hawaii.edu) Magnetic Disks! Magnetic disks (a.k.a. hard drives ) are (still) the most common secondary storage devices today! They

More information

Chapter 9: Peripheral Devices. By: Derek Hildreth Chad Davis

Chapter 9: Peripheral Devices. By: Derek Hildreth Chad Davis Chapter 9: Peripheral Devices By: Derek Hildreth Chad Davis Brigham Young University - Idaho CompE 324 Brother Fisher Introduction When discussing this chapter, it has been assumed that the reader has

More information

Disks. Storage Technology. Vera Goebel Thomas Plagemann. Department of Informatics University of Oslo

Disks. Storage Technology. Vera Goebel Thomas Plagemann. Department of Informatics University of Oslo Disks Vera Goebel Thomas Plagemann 2014 Department of Informatics University of Oslo Storage Technology [Source: http://www-03.ibm.com/ibm/history/exhibits/storage/storage_photo.html] 1 Filesystems & Disks

More information

Appendix D: Storage Systems

Appendix D: Storage Systems Appendix D: Storage Systems Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Storage Systems : Disks Used for long term storage of files temporarily store parts of pgm

More information

Semiconductor Memory Types Microprocessor Design & Organisation HCA2102

Semiconductor Memory Types Microprocessor Design & Organisation HCA2102 Semiconductor Memory Types Microprocessor Design & Organisation HCA2102 Internal & External Memory Semiconductor Memory RAM Misnamed as all semiconductor memory is random access Read/Write Volatile Temporary

More information

The Server-Storage Performance Gap

The Server-Storage Performance Gap The Server-Storage Performance Gap How disk drive throughput and access time affect performance November 2010 2 Introduction In enterprise storage configurations and data centers, hard disk drives serve

More information

CISC 7310X. C11: Mass Storage. Hui Chen Department of Computer & Information Science CUNY Brooklyn College. 4/19/2018 CUNY Brooklyn College

CISC 7310X. C11: Mass Storage. Hui Chen Department of Computer & Information Science CUNY Brooklyn College. 4/19/2018 CUNY Brooklyn College CISC 7310X C11: Mass Storage Hui Chen Department of Computer & Information Science CUNY Brooklyn College 4/19/2018 CUNY Brooklyn College 1 Outline Review of memory hierarchy Mass storage devices Reliability

More information

Data Storage and Disk Structure

Data Storage and Disk Structure Data Storage and Disk Structure A Simple Implementation of DBMS One file per table Students(name, id, dept) in a file Students A meta symbol # to separate attributes Smith#123#CS Johnson#522#EE Database

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures CS61C L40 I/O: Disks (1) Lecture 40 I/O : Disks 2004-12-03 Lecturer PSOE Dan Garcia www.cs.berkeley.edu/~ddgarcia I talk to robots Japan's growing

More information

Ref: Chap 12. Secondary Storage and I/O Systems. Applied Operating System Concepts 12.1

Ref: Chap 12. Secondary Storage and I/O Systems. Applied Operating System Concepts 12.1 Ref: Chap 12 Secondary Storage and I/O Systems Applied Operating System Concepts 12.1 Part 1 - Secondary Storage Secondary storage typically: is anything that is outside of primary memory does not permit

More information

Lecture 23: Storage Systems. Topics: disk access, bus design, evaluation metrics, RAID (Sections )

Lecture 23: Storage Systems. Topics: disk access, bus design, evaluation metrics, RAID (Sections ) Lecture 23: Storage Systems Topics: disk access, bus design, evaluation metrics, RAID (Sections 7.1-7.9) 1 Role of I/O Activities external to the CPU are typically orders of magnitude slower Example: while

More information

Advanced Database Systems

Advanced Database Systems Lecture II Storage Layer Kyumars Sheykh Esmaili Course s Syllabus Core Topics Storage Layer Query Processing and Optimization Transaction Management and Recovery Advanced Topics Cloud Computing and Web

More information

Yima 1 : Design and Evaluation of a Streaming Media System for Residential Broadband Services

Yima 1 : Design and Evaluation of a Streaming Media System for Residential Broadband Services Yima 1 : Design and Evaluation of a Streaming Media System for Residential Broadband Services Roger Zimmermann, Kun Fu, Cyrus Shahabi, Didi Yao, and Hong Zhu Integrated Media Systems Center 3740 McClintock

More information