BiHOP: A Bidirectional Highly Optimized Pipelining Technique for Large-Scale Multimedia Servers

: A Bidirectional Highly Optimized Pipelining Technique for Large-Scale Multimedia Servers Kien A. Hua James Z. Wang Simon Sheu Department of Computer Science University of Central Florida Orlando, FL 3286-0362, U. S. A. kienhua, zwang, sheu @cs.ucf.edu Abstract We present a technique, called Bidirectional Highly Optimized Pipelining (), for managing disks as a buffer for the tertiary storage of a multimedia server. We implement a simulator to compare its performance to that of a recently proposed scheme called. The results show that performs significantly better. Its superior performance is attributed to a novel caching approach which caches every other data fragment of the multimedia file, rather than caches consecutive fragments as in traditional practice. This new approach allows us to use tiny staging buffers for pipelining, which can be implemented in the memory to conserve disk bandwidth. Furthermore, the whole disk space can be dedicated for caching purposes to improve hit ratio. Another important advantage of is its ability to pipeline in either forward or reverse direction with the same efficiency. This unique feature, not possible with existing schemes, makes it natural for implementing VCR functions.. Introduction The high storage cost of video files is a major concern for many potential multimedia applications. Even with the most sophisticated compression, video has a voracious appetite for storage. For example, to compress a 30-second ad with MPEG-2 running at 4 megabits per second. we would need 5 MBytes, and a 00-minute movie would take 3 GByte. One way to reduce the storage costs is to organize the storage subsystem as a hierarchy, in which the magnetic disks are used as a cache for the tertiary storage devices (e.g., optical disk arrays). When a cache miss occurs in a hierarchical storage subsystem, the simplest way to deal with this is to materialize the whole object onto the disk before sending it to the station. This approach, however, will result in unacceptable latencies. A pipelining technique, called PIRATE, was proposed in [4] by Ghandeharizadeh and Shahabi to address this problem. In their scheme, a video file is divided into a sequence of slices 0 such that the time of eclipses the time required to materialize (i.e., loading into disk), where. This strategy ensures a continuous while reducing the latency time because the system can initiate the of an object as soon as a fraction of the object (i.e., 0) is disk resident. In this paper, 0 is referred to as the HEAD, and the following slices (i.e., ) are collectively referred to as the TAIL of the object. A drawback of this scheme is that sufficient disk space large enough to contain the entire video file must be reserved before the pipelining mechanism can take place. Waiting for the availability of such a large disk space can lengthen the access latency. The demand for the large buffer space will also flush out many potentially useful data. To address the afordmentioned issues, (Space Efficient Pipelining), proposed by Wang, Hua and Young in [8], pipelines the slices in the TAIL through a staging buffer equal the size of. As soon as the pipelining is completed, the space occupied by this buffer is immediately returned to the buffer pool. To further improve the performance, three additional features were used in [8]: buffer shrinking, space stealing and object pinning. Although we showed in [8] that significantly improved the long latency times of PIRATE, its space requirement is still very significant. Another disadvantage of this scheme is that pipelining can only be done in the forward direction. This drawback makes it unsuitable for implementing VCR functions. In this paper, we propose a different pipelining approach, called (Bidirectional Highly Optimized Pipelining). Our design is motivated by the following two factors:

' '. Reducing Pipelining Cost: The admission cost for a request under is still quite expensive. Even when the HEAD of an object is already disk resident, it needs to reserve a staging buffer about the size of the second slice of the object before pipelining can take place. We should be able to reduce this size to only a few disk blocks. If this goal is achieved, we will be able to free up a lot of disk space to support a larger number of concurrent users; the lower admission costs will also help to reduce the access latencies. 2. Supporting VCR functions: It is highly desirable to provide common VCR functions such as fast-forward and fast-reverse. These features are not well supported by PIRATE or. We need to design a bidirectional pipelining strategy. Such a symmetric approach will allow the user to scan in either direction with the same efficiency. Preferably, the caching strategy can effectively support fast-forward and fast-reverse without having to involve the tertiary devices because their bandwidths are very limited. Thus our ambition is to address both the performance and functionality issues. Both of these objectives were achieved in which is bidirectional in functionality and optimal in space utilization. The remainder of this paper is organized as follows. Bi- HOP and its benefits are discussed in Section 2. In Section 3, we describe our simulation model. The results of the performance study are examined in Section 4. In Section 5, we focus on the implementation of VCR functions. Finally, we give the conclusions and discuss our future research in Section 6. 2. Approach We describe the pipelining strategy and data replacement policy used by in the following subsections. Discussions on the VCR functions are deferred until Section 5. 2.. Intelligent Pipelining In, we divide the whole object into two categories of fragments. One category is called disk-resident fragment (or D-fragment) and the other is called tertiarydevice-resident fragment (or T-fragment). The D and T- fragments interleave the data file as illustrated in Figure. With this file organization, the pipelining is performed as follows. As the system s the first D-fragment 0, it materializes the next T-fragment from tertiary devices. For 0, it materializes while ing and. Obviously, to maintain the continuous, The elapsed time of ing 0 should be equal to the elapsed time of materializing ; and the elapsed time of ing 2 3 4 n-2 n- n n+ 0 time D0 T D T2 D2 T3 D3 T4 Tn Dn... : cached data : data loaded on demand... Figure. pipelining technique. and should be equal to the elapsed time of materializing for 0. Mathematically, we can express the above requirements as follows: 0 for 0 2 where and denote the materialize rate of an object from tertiary devices to disks and the transfer rate of data from disks to a station, respectively. We make the sizes of the fragments more uniform by letting and for 0; and 0. Substituting the values and into Equation (2), we have: where (Production Consumption Rate) is defined as can the ratio of to. The size of the entire object then be computed in terms of and as follows: "! # %$ Let and represent the accumulative size of all the D- fragments and & the accumulative %$ size of all the T-fragments, respectively. and can be computed as follows: # ) %$ #! ( 0 ( If +*, we have the following approximation: #-,. /02 (3) %$3, /02 "! The equations derived above serve as the foundation for the subsequent discussions. We note that unlike the data fragments in PIRATE and which are monotonously decreasing in size, there are only two types of fragments in. Its regular design makes it possible to pipeline in either forward or reverse direction. We will discuss the implementation of VCR functions in details later.

$ $ 2.2. Space Optimization With the new pipelining scheme, we must load all the D-fragments into the disk system before the can start. We note that the size of & is equal to the size of the HEAD (i.e., 0) in PIRATE and for a given file. In, D-fragments (disk-resident fragments) are kept in the disk buffer for as long as possible. We will discuss the replacement policy in the next subsection. For the moment, let us focus on the space required by the staging buffers to retrieve the T-fragments from the tertiary devices. There are two ways to implement the staging buffers: Double Buffer: We maintain two buffers, one for reading and one for writing. While the data of fragment is being transferred to the station from one buffer (consumption buffer), the tertiary device writes into another buffer (production buffer). These two buffers switch roles when the current consumption buffer is exhausted. Obviously, the size of these two buffers is T, and the total space required by this scheme is 2. Single Buffer: This approach uses a circular buffer shared by both the consumption and production procedures as illustrated in Figure 2. The space requirement for this approach is. CD Read Data Memory Buffers Used Free Magnetic Disk Buffer Display Data Load into Memory Figure 2. Circular staging buffer. Display Device When either approach is used, we should minimize to keep the size of the staging buffer minimal. Equation (2.) is repeated below: We can reduce the fraction + $ to its irreducible form $, such that is prime to. Thus, the minimum size for the T-fragments is blocks, where block is an efficient unit for I/O operations and. Accordingly, the size for! D-fragments, except the first one, should be.. The size for the first D-fragment is or blocks. For instance, if 0! 6, we have 3 and 5. The sizes of D-fragments and T-fragments, therefore, are 2 blocks and 3 blocks, respectively. This example is illustrated in Figure. Let a block be 4 KBytes. The size of the staging buffer is 2 blocks or 24 KBytes if double buffering is used, and is blocks or 6 KBytes if circular buffering is used. would have required a staging buffer as large as 363 MBytes. PIRATE does not use a staging buffer. In this case, it would have required a disk space of 907 MBytes in order to retrieve the TAIL of the object. Obviously, the savings due to are tremendous. The tiny size of the staging buffers used in offers many benefits: Since the size is so small, the staging buffer can be implemented in the memory. This approach allows the pipelining to bypass the disk subsystem leaving all the disk bandwidth to the replacement activities of the D-fragments. Since the staging buffers require no disk space, the disk space saved can be used to support more users, and therefore improve the throughput of the system. A smaller staging buffer translates into a lower admission cost. Users, therefore, will experience better access latencies. We note that one can consider using the technique proposed in [7] to manage the in-memory staging buffers. This scheme takes advantage of the fact that each staging buffer shrinks as its data are being forwarded to the station. Since the storage subsystem must multiplex its bandwidth to refresh these buffers in a round-robin fashion, the space released by the shrinking buffers can be given to the ones which are being refreshed. Since staging buffers take turns to use the same memory space, reduction in memory requirement is possible. The performance study in [7] shows that up to 50% savings in memory space is achievable. 2.3. Replacement Policy As we have mentioned, pipelining is done through the memory system bypassing the disk units. The disk buffer is used exclusively for caching the D-fragments. The replacement policy for the D-fragments is presented in Figure 3. The following notations are used in the algorithm: : Size of the requested object in block. : Size of the disk-resident portion of object. : Access frequency of object. : The set of disk resident objects not currently ed. Algorithm Reserve is based on Equation (3). For each video object requested, it computes the additional

Algorithm: REPLACE :! " #%$ &('!)*'%&%+(', - - free disk space needed space if (needed space. repeat 0) then victim / the object (0 ) with lowest HEAT in if ( 22 is null) then return(failure) else if (3 45)*67 22 98:*5 ) then 3 45)*67 22 ;<3 45)*67 22 >=-:*5 free the last :*5 amount of 22 s space for :*5 0 else displace 22 to make room for remove 22 from :*5 <:*5 =-3 45)*67 22 until (:*5 0) Allocate amount of free disk space for object return(success) Algorithm: RESERVE if no D-fragments of is disk resident : then (@?2 =-$%A&%>B )>45C9', -ED else (@?2 = $%A&%FB)>45C9', -G= 3 45)*67 -ED return( ) Figure 3. replacement algorithm. amount of disk space required to load the D-fragments not currently in the disk buffer. Once this has been determined, Algorithm REPLACE tries to satisfy this requirement by using as much of the free disk space as possible. If there is not enough free disk space, it casts out as many objects as necessary to make room for the request. We note that the unit for replacement is a D-fragment, not a whole object. Thus, some of the disk-resident objects may have some, but not all, of its D-fragments in the buffer. We note that LRU policy can be used to select victims for replacement. Alternatively, access frequencies of video objects are usually known beforehand [2]. This information can also be used to select victims. Without loss of generality, the latter approach is used in the presentation of Algorithm REPLACE. 3. Simulation Model In the previous sections, we have analyzed the advantages of in terms of disk space utilization. Although this metric has the most direct impact on the system performance, it is still worthwhile to investigate the ultimate performance metrics, namely access delay and system throughput. To do this, we decided to use a simulation model since it becomes too complex to do it analytically. The simulation environment is presented in the following. We will examine the simulation results in the next section. Our simulation model is similar to the one used in [8]. The Request Generator generates requests for multimedia objects and submits them to the Waiting Queue. The Scheduler examines the requests in the queue in a FCFS manner. When bandwidths become available to serve the pending request at the head of the queue, the scheduler forwards the request to the Serving Unit. Serving Unit then allocates a playback stream to serve this request. Serving Unit simulates a hierarchical storage system and the playback mechanism. The buffer manager was implemented using the replacement policies presented in Figure 3. We note that the requests arriving at the Waiting Queue can be viewed as coming from different users. Our simulator allows multiple requests to be served simultaneously by different playback streams. This model is different from the single-user environment modeled in [4], which does not allow concurrent playback of several video files. In terms of the workloads, each user request is characterized by an interarrival time and choice of object. User request interarrivals were modeled using a Poisson process. The access frequencies of objects in the database follow a Zipf-like distribution [5, 6, 9]. Let be the total number of requests for a simulation run. The number of requests for each object H is determined as follows: where R 2I5J KML NO PQ I, is the number of objects in the system, and 0 + is the skew factor. A larger value corresponds to a more skew condition, i.e., some objects are accessed considerably more frequently than other objects. When 0, the distribution is uniform, i.e. all the objects have the same access frequency. This Zipf-like distributionis similar to the distribution used in [2]. Each workload consists of 20,000 requests. A workload, called a job sequence, is generated for each skew condition. For each simulation run, the same sequence is used for both and. Thus, the Request Generator does not really generate requests on the fly. Instead, it keeps a database of these request sequences. For each simulation run, it scans the appropriate sequence, and appends the next request from the sequence to the Waiting Queue when the corresponding inter-arrival time is up. Without loss of generality, we assume that all client devices have the same rate (i.e., 2 is constant). is determined from the PCR. The default values for the system and workload parameters are given in Table. In our experiments, many of these parameters were also varied to perform various sensitivity analyses. In this study, the system throughput is computed by dividing the number of requests in the job sequence (i.e., 20,000) by the total simulated time (i.e., the time it takes to serve the 20,000 requests). The average latency is computed as the mean of the 20,000 individual latencies. To avoid the buffer warm-up effect, we actually ran another short sequence of requests to fill up the disk buffer before the actual run takes

Block size 4 KBytes Disk space 3 500 000 blocks 00 blocks/sec 80 blocks/sec ( 0 8) Zipf factor 0 7 Requests per minute 30 (average interarrival time is 2 sec) Number of objects 600 Minimum object size 00 000 blocks Maximum object size 200 000 blocks Number of requests 20 000 Table. Simulation parameters. place. The requests in the short sequence were randomly selected from the long sequence to ensure that the data initially cached in the buffer (to simulate the steady-state condition) were relevant and truthfully reflected the distribution of the requests in the workload (long sequence). 4. Simulation Results We present the simulation results in the following subsections. 4.. Effect of Request Rate The effect of the request rate on and is plotted in Figures 4(a) and (b). In this experiment, the size of the disk buffer was set at 5% of the database size. We gradually increased the request rate from 20 requests/minute to 50 requests/minute, and observed how well these two schemes could sustain the faster request rates. Figure 4(a) shows that consistently provides better average latency than can. The savings range from 300% (under 20 requests per minute) to,000% (under 40 requests per minute). In terms of system throughput, although both schemes perform comparably under slow request rates (less than 30 requests per minute), only can continue to extend its good performance beyond 30 requests/minute. When the request rate is 50 requests/minutes, we see that outperforms by 54% in terms of system throughput. The improvement on system throughput is really a lot more significant because applications normally have requirements on the maximum access latency. Let us say that the required average access latency for some video-on-demand application is two minutes. Under this condition, can handle no more than 20 requests/minute. This limits its performance to less than,200 services/hour. On the contrary, can sustain request rates well beyond 40 requests/minute. This allows to offer substantially better system throughput. For instance, if we let operate at 40 requests/minute while is constrained to 20 requests/minute, the difference in system throughput is more than double. Obviously, much better throughput is achievable by further increasing the request rate for. We note that the dramatic improvement, due to, observed here is consistent with the analytical results discussed previously. 4.2. Effect of Space Ratio We define the space ratio, Space Ratio, as the ratio of the disk size to the database size. In this experiment, we want to investigate the effect of this ratio on the performance of the two disk-buffer designs. A good technique should be able to achieve good performance using a reasonably small buffer. In other words, we want to keep the Space Ratio as small as possible without compromising too much performance. The results of this study is plotted in Figures 4(c) and (d). We varied the space ratio from 0% to 30%. Figure 4(c) shows that consistently outperforms by a significant margin for practical buffer sizes (i.e., space ratio is less than 5%). For instance, the average latency of is more than eight times better than that of when the space ratio is 5%. In terms of system throughput, the performance difference is not significant under this workload because the performance of is unfairly constrained to the 30-requests/minutes request rate. We decided not to run this experiment under a higher request rate, say 40, because the average latency for would have been too high for most applications. This issue was discussed in the last subsection. 4.3. Effect of Access Skew Although movie-on-demand and many multimedia applications are known to have a skew factor of around 0.7 (which is used in the above experiments), other applications can have very different access patterns (i.e., different skew factors.) We investigate this effect on and in this subsection. The results of this study is shown in Figure 5. We varied the skew factor between 0.0 (a uniform pattern) and.0 (a severe skew condition). It shows that the performance of and improves as we increase the skew factor. This behavior is due to the improvement in the temporal locality of reference causing the hit ratio of the disk buffer to improve. In comparison, we observe that tremendously outperforms in terms of average access latency. is around 50% better than in terms of system throughput when the workload is uniform. The differences in system throughput are not significant under the severe skew workload due to the same reasons explained in Section 4.2.

Latency Time(Seconds) 6384 892 4096 2048 024 52 256 28 64 32 20 25 30 35 40 45 50 Request Rate(Requests/Min) (a) Latency times for different request rates. Throughput(Services/Hour) 2600 2400 2200 2000 800 600 400 200 000 20 25 30 35 40 45 50 Request Rate(Requests/Min) (b) Throughputs for different request rates. Latency Time(Seconds) 892 4096 2048 024 52 256 28 64 32 6 0 0.2 0.4 0.6 0.8 Zipf Factor (a) Latency times for different Zipf factors. Throughput(Services/Hour) 750 700 650 600 550 500 450 400 350 300 250 200 0 0.2 0.4 0.6 0.8 Zipf Factor (b) Throughputs for different Zipf factors. Figure 5. Skew effect on performance. 6384 Latency Time(Seconds) 892 4096 2048 024 52 256 28 64 32 6 0. 0.5 0.2 0.25 0.3 Space Ratio(Disk Space/Database Size) (c) Latency times for different space ratio. Throughput(Services/Hour) 800 700 600 500 400 300 200 00 000 900 0. 0.5 0.2 0.25 0.3 Space Ratio(Disk Space/Database Size) (d) Throughputs for different space ratio. Figure 4. Performance comparison. 5. Support VCR Functions Another important feature of the approach is its efficient support for VCR functions, such as random access, fast-forward and fast-reverse. We discuss these features in the following subsections. 5.. Random access Let us first examine PIRATE and. Since both of these techniques cache the HEADs in the disk buffer, let us consider the case when the HEAD of the object being used, say, is already disk resident. If the random access starts at some point in the HEAD, then the delay time for setting up the pipeline is equal to the time it takes to the portion in front of (see Figure 6). This is due to the fact that the duration for playing the HEAD starting from will not be long enough to eclipse the time it takes to materialize the entire. The time difference is the playback time of. Thus, we have to spend that amount of time (i.e., the delay) to load the leading blocks of (i.e., in Figure 6) before the pipelining can start. Hence the delay is computed as follows: 4

where 0. denotes the fraction of the data file preceding the start point. For instance, 0! 5 if one starts the playback at the middle of the file. X: D X 0 Start Point M X Figure 6. Random access in the HEAD under PIRATE and. X: Z = x Size(X) X 2 X X X 0 k Y D M Start Point (-x)size(x) Figure 7. Random access in the TAIL under PIRATE and. Now, let us consider the case of starting the playback at some random point in a non-head fragment. As illustrated in Figure 7, the delay for setting up the pipeline is equal to the time to materialize the portions and. To ensure a continuous playback, we must have the time to materialize the portion equal the time to play back the portion. Thus, Delay time can be computed as: J for 0. Delay J (J J for.! 5 Let be the total playback time of the entire video file, i.e., Delay. Substitute into Equation (5), we have: for 0. J J for.! 6 If every point in the object is equally likely to be accessed as the starting point, the average delay for random access can be computed as follows: Average Delay 0.. 2.! (7) Assuming 0! 6, the delay is about 20% of the playback time of the whole file. Thus, if a movie is hour long, the average delay for random access will be 2 minutes. This is certainly not tolerable by most viewers. Let us now turn our attention to. If the starting point is in a T-fragment, the delay for setting up the pipeline is the time to materialize areas and 2 as illustrated in Figure 8. We need to materialize area because the data is not yet in the disk buffer. After area has been materialized, the duration for playing this area and the next D-fragment is not long enough to eclipse the time to materialize the next T-fragment. The time difference is the time to materialize area 2. Let the size of area be., where is the portion of the T-fragment excluded from the playback. The size of area 2 can then be computed as follows. To ensure a continuous playback, the following relationship must hold:! 2 or! 2! Thus the delay time is: Delay.#" J $! That is, the maximum delay time cannot be longer than the time to materialize a T-fragment which is only a few blocks. If the starting point is in a D-fragment, the delay time is the time to materialize the portion in the next T-fragment as shown in Figure 9. To ensure a continuous playback, we must have the following relationship: Thus the delay time is: Delay! $ $ $! Again, the maximum delay time cannot be longer than the time to materialize a T-fragment. X: t T D 2 Start Point Figure 8. Start the playback in some T-fragment. X: d t T Start Point Figure 9. Start the playback in some D-fragment. We have shown that the delay is only a few blocks for independent of where one wants to start the playback of a file. Such a tiny delay is generally unnoticeable. D T T (8)

For instance, let us consider a -hour MPEG-2 video with 0! 6 and a block size of 4 KBytes. The average delay, in this case, is less than 0.05 second. The average delay would have been 2 minutes if PIRATE or were used. 5.2. Fast-Forward/Reverse Although normal playback is the most important function for all multimedia applications, providing the user with VCR capabilities such as fast-forward and fast-reverse is also highly desired. Several approaches have been proposed for implementing these special functions [3, ]. The most straight forward technique is to retrieve and transmit the multimedia stream in higher speed, say times the normal playback rate. It is apparent that this simple scheme requires times the system resources. Obviously, all PIRATE, and will not be able to avoid these extra costs if this approach is used. Alternatively, the Loss scheme in [3] or the Frame Skipping scheme in [] can be used to reduce the requirement on system resources. The idea behind these methods is to skip forward or backward through the video file showing one out of several blocks. This strategy is not suitable for PIRATE or due to their unidirectional nature. For instance, neither PIRATE nor can efficiently support a fast-reverse soon after a random access. In this case, some of the data needed might not be in the buffer, and reverse pipelining is not an option for these techniques. We note that this sequence of two operations is very commonly used to search in a video file. Although fast-reverse during a normal play is possible with PIRATE, this function is provided at the cost of retaining all the data in the buffer until the end of the session. If the normal play is resumed after a fast-forward, this sequence of operations has the same effect as a random access. The intolerable delay as discussed in the last subsection is inevitable for either PIRATE or. On the contrary, the methods presented in [3, ] are natural for because it also skips through the file and caches in the disk buffer only every other fragment, i.e., skipping T-fragments and caching D-fragments. Interestingly, when all the D-fragments are disk resident, the symmetrical nature of allows fast-forward and fast-reverse be done without even involving the tertiary devices. If one needs to resume the normal play after a fast-forward or fast-reverse, the delay is essentially unnoticeable since this can be treated as a random access. 6. Conclusions and Future Studies This study focuses on disk space management of hierarchical storage in multimedia systems. We have proposed a novel technique called Bidirectional Highly Optimized Pipelining (). Our simulation results indicate that significantly outperforms a recently proposed technique called. This result can be attributed to our caching technique which caches every other data fragment of the multimedia file, rather than caching consecutive fragments as in traditional practice. This new approach allows us to use tiny staging buffers for pipelining. Their small sizes allow them to be implemented in the server memory to conserve disk bandwidth. The whole disk space, therefore, can be dedicated for caching purposes to improve the hit ratio. Another important benefit of is its bidirectional nature. While the sizes of the data fragments are nonuniform in existing techniques, the symmetrical file organization (i.e., uniform pattern) of allows it to pipeline in either forward or reverse direction with the same efficiency. This unique feature, not possible with other schemes, makes natural for implementing VCR functions. In this study, we focused our attention on disk space management. Other scarce resources include the transmission bandwidths between different levels of the hierarchical storage. We are currently investigating techniques to make more efficient use of these resources. Admission control is another issue needed to be studied more carefully. Our simulator does not currently implement an admission-control policy. Instead, each request is automatically accepted. References [] M. Chen, D. Kandlur, and P. S. Yu. Support for fully interactive playout in a disk-array-based video server. In Proc. of ACM Multimedia, pages 39 398, 994. [2] A. Dan, D. Sitaram, and P. Shahabuddin. Scheduling policies for an on-demand video server with batching. In Proc. of ACM Multimedia, pages 5 23, October 994. [3] J. K. Dey-Sircar et al. Providing VCR capabilities in largescale video servers. In Proc. of ACM Multimedia, pages 25 32, October 994. [4] S. Ghandeharizadehand C. Shahabi. On multimedia repositories, personal computers, and hierarcical storage systems. In Proc. of ACM Multimedia, pages 407 46, October 994. [5] K. A. Hua, C. Lee, and C. M. Hua. Dynamic load balancing in multicomputer database systems using partition tuning. IEEE Trans. on Knowledge and Data Engineering, 7(6):968 983, December 995. [6] D. E. Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching. Addison Wesley, Reading, Massachusetts, 973. [7] R. T. Ng and J. Yang. Maximizing buffer and disk utilizations for news-on-demand. In Proc. of the 20th VLDB Conference, Santiago, Chile, 994. [8] J. Z. Wang, K. A. Hua, and H. C. Young. : a space efficient pipelining technique for managing disk buffers in multimedia servers. In Proc. of the IEEE int l Conf. on Multimedia Computing and Systems, Hiroshima, Japan, June 996. [9] G. K. Zipf. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Reading, Mass., 949.