Etsuko Yajimay Takahiro Haraz Masahiko Tsukamotoz Shojiro Nishioz. ysales Department, Tokyo Oce, FM Osaka Co., Ltd.

Similar documents
An Adaptive Query Processing Method according to System Environments in Database Broadcasting Systems

Volume 3, Issue 9, September 2013 International Journal of Advanced Research in Computer Science and Software Engineering

Stretch-Optimal Scheduling for On-Demand Data Broadcasts

An Efficient Partition and Matching Algorithm for Query-Set-based. Broadcasting in Multiple Channel Mobile Environment

Data Access on Wireless Broadcast Channels using Keywords

Replicated Part Non Replicated

Project Report, CS 862 Quasi-Consistency and Caching with Broadcast Disks

A Hybrid Data Delivery Method of Data Broadcasting and On-demand Wireless Communication

Coding and Scheduling for Efficient Loss-Resilient Data Broadcasting

Broadcast Disks: Scalable solution for an asymmetric environment

Dynamic Broadcast Scheduling in DDBMS

Object Modeling from Multiple Images Using Genetic Algorithms. Hideo SAITO and Masayuki MORI. Department of Electrical Engineering, Keio University

Analysis of a Cyclic Multicast Proxy Server Architecture

Periodic Scheduling in On-Demand Broadcast System

Web-based Energy-efficient Cache Invalidation in Wireless Mobile Environment

Hybrid Cooperative Caching in a Mobile Environment

Indexing Techniques for Wireless Data Broadcast under Data Clustering and Scheduling

Data Indexing for Heterogeneous Multiple Broadcast Channel

Evaluation of a Broadcast Scheduling Algorithm

Multiversion Data Broadcast

Introduction to Approximation Algorithms

Data Dissemination Techniques in Mobile Computing Environment

TCP over Wireless Networks Using Multiple. Saad Biaz Miten Mehta Steve West Nitin H. Vaidya. Texas A&M University. College Station, TX , USA

RESPONSIVENESS IN A VIDEO. College Station, TX In this paper, we will address the problem of designing an interactive video server

On a Cooperation of Broadcast Scheduling and Base Station Caching in the Hybrid Wireless Broadcast Environment

Pull vs. Hybrid: Comparing Scheduling Algorithms for Asymmetric Time-Constrained Environments

Web page recommendation using a stochastic process model

A NEW HEURISTIC ALGORITHM FOR MULTIPLE TRAVELING SALESMAN PROBLEM

Efficient Remote Data Access in a Mobile Computing Environment

A Cost-Efficient Scheduling Algorithm of On-Demand Broadcasts

TECHNICAL RESEARCH REPORT

2 Data Reduction Techniques The granularity of reducible information is one of the main criteria for classifying the reduction techniques. While the t

MOBILE VIDEO COMMUNICATIONS IN WIRELESS ENVIRONMENTS. Jozsef Vass Shelley Zhuang Jia Yao Xinhua Zhuang. University of Missouri-Columbia

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University

CPET 565/CPET 499 Mobile Computing Systems. Lecture 8. Data Dissemination and Management. 2 of 3

Stretch-Optimal Scheduling for On-Demand Data Broadcasts

Energy-Efficient Mobile Cache Invalidation

A Novel Replication Strategy for Efficient XML Data Broadcast in Wireless Mobile Networks

Transaction Processing in Mobile Database Systems

Pull vs Push: A Quantitative Comparison for Data Broadcast

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for

(a) (b) (c) Phase1. Phase2. Assignm ent offfs to scan-paths. Phase3. Determination of. connection-order offfs. Phase4. Im provem entby exchanging FFs

Connectivity, Energy and Mobility Driven Clustering Algorithm for Mobile Ad Hoc Networks

cell router mobile host

Quasi-consistency and Caching with Broadcast Disks

More NP-complete Problems. CS255 Chris Pollett May 3, 2006.

Chapter 9 Graph Algorithms

Advanced Methods in Algorithms HW 5

RESEARCH ARTICLE. Accelerating Ant Colony Optimization for the Traveling Salesman Problem on the GPU

VANS: Visual Ad hoc Network Simulator

Signature caching techniques for information filtering in mobile environments

Using Hybrid Algorithm in Wireless Ad-Hoc Networks: Reducing the Number of Transmissions

Chapter 9 Graph Algorithms

1 The Traveling Salesman Problem

V1.0: Seth Gilbert, V1.1: Steven Halim August 30, Abstract. d(e), and we assume that the distance function is non-negative (i.e., d(x, y) 0).

Module 6 NP-Complete Problems and Heuristics

Notes for Lecture 24

Drawing Bipartite Graphs as Anchored Maps

1e+07 10^5 Node Mesh Step Number

PBW 654 Applied Statistics - I Urban Operations Research. Unit 3. Network Modelling

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

[8] I.F. Akyildiz and J.S.M. Ho. Dynamic moible user location update for wireless pcs networks. Wireless Neworks, 1(2), 1995.

Data Caching under Number Constraint

Predicated Software Pipelining Technique for Loops with Conditions

LEAST COST ROUTING ALGORITHM WITH THE STATE SPACE RELAXATION IN A CENTRALIZED NETWORK

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax:

On The Complexity of Virtual Topology Design for Multicasting in WDM Trees with Tap-and-Continue and Multicast-Capable Switches

A Method for Secure Query Processing in Mobile Databases

An approximation algorithm for a bottleneck k-steiner tree problem in the Euclidean plane

1 The Traveling Salesman Problem

Performance Evaluation of a Wireless Hierarchical Data Dissemination System

An Improved Upper Bound for the Sum-free Subset Constant

An Updates Dissemination Protocol for Read-Only Transaction Processing in Mobile Real-Time Computing Environments

Partha Sarathi Mandal

Fuzzy Inspired Hybrid Genetic Approach to Optimize Travelling Salesman Problem

A B. A: sigmoid B: EBA (x0=0.03) C: EBA (x0=0.05) U

Using Genetic Algorithm with Triple Crossover to Solve Travelling Salesman Problem

A Fast Recursive Mapping Algorithm. Department of Computer and Information Science. New Jersey Institute of Technology.

Technische Universität München, Zentrum Mathematik Lehrstuhl für Angewandte Geometrie und Diskrete Mathematik. Combinatorial Optimization (MA 4502)

State Space Reduction for the Symmetric Traveling Salesman Problem through Halves Tour Complement

CS 204 Lecture Notes on Elementary Network Analysis

Data gathering using mobile agents for reducing traffic in dense mobile wireless sensor networks

Scan-Based BIST Diagnosis Using an Embedded Processor

Improved Collision Resolution Algorithms for Multiple Access Channels with Limited Number of Users * Chiung-Shien Wu y and Po-Ning Chen z y Computer a

Module 6 P, NP, NP-Complete Problems and Approximation Algorithms

Adaptive Data Dissemination in Mobile ad-hoc Networks

Hidefumi Wakamatsu, Yuusuke Tanaka, Akira Tsumaya, Keiichi Shirase, and Eiji Arai

Extensions to RTP to support Mobile Networking: Brown, Singh 2 within the cell. In our proposed architecture [3], we add a third level to this hierarc

Recent PTAS Algorithms on the Euclidean TSP

A Backoff Algorithm for Improving Saturation Throughput in IEEE DCF

The Memetic Algorithm for The Minimum Spanning Tree Problem with Degree and Delay Constraints

Department of. Computer Science. Remapping Subpartitions of. Hyperspace Using Iterative. Genetic Search. Keith Mathias and Darrell Whitley

(INTERFERENCE AND CONGESTION AWARE ROUTING PROTOCOL)

source3 Backbone s1 s2 R2 R3

Improvement of Buffer Scheme for Delay Tolerant Networks

Efficient Data Access to Multi-channel Broadcast Programs

Modified Order Crossover (OX) Operator

A tabu search based memetic algorithm for the max-mean dispersion problem

Lower Bounds for Insertion Methods for TSP. Yossi Azar. Abstract. optimal tour. The lower bound holds even in the Euclidean Plane.

Information Discovery, Extraction and Integration for the Hidden Web

Transcription:

Scheduling and Caching Strategies for Correlated Data in Push-based Information Systems 3 Etsuko Yajimay Takahiro Haraz Masahiko Tsukamotoz Shojiro Nishioz ysales Department, Tokyo Oce, FM Osaka Co., Ltd. yajima@fmosaka.net zdept. of Information Systems Eng., Graduate School of Engineering, Osaka University fhara,tuka,nishiog@ise.eng.osaka-u.ac.jp ABSTRACT Recently, there has been increasing interest in information systems that deliver data using broadcast in both wired and wireless environments. The strategy in which a server repeatedly broadcasts data to clients can result in a larger throughput, and various methods have been studied to reduce the average response time to data requests in such systems. In this paper, we propose a strategy for scheduling the broadcast program which takes into account the correlation among data items. This strategy puts data items with strong correlation side by side in the broadcast program in order to reduce the average response time. We also propose a caching strategy which extends a conventional caching strategy so that it can take advantage of correlation among broadcast data items for greater eciency. Finally, we use simulation studies to evaluate the performance of our proposed strategies. Keywords: data broadcast, data correlation, scheduling strategy, caching strategy 1 INTRODUCTION Recently, there has been increasing interest in data delivery mechanisms in which a server repetitively broadcasts various data to clients using a broad bandwidth, called \pushbased" data delivery mechanisms. As shown in Figure 1, in a system that uses a \push-based" mechanism, each client can access a piece of data by waiting for its data broadcast period. In contrast, when conventional \pull-based" mechanisms are used, servers deliver data by separately responding to every request message sent by each client. One remarkable advantage of the \push-based" delivery mechanism is higher throughput for data access in distributed systems with a large number of clients, since the absence of commu- 3 This research was supported in part by Research for the Future Program of Japan Society for the Promotion of Science under the Project \Advanced Multimedia Content Processing" (Project No. JSPS-RFTF97P00501) and Grant-in-Aid for Scientic Research numbered 12480095 from Japan Society for the Promotion of Science. Server Broadcast Data1 Data Stream....... 2 3 Clients Figure 1: Information system based on message broadcast. nication contention among the clients requesting data means that they can eciently share the bandwidth[6, 8, 12, 18]. Information broadcast systems are good applications of \push-based" mechanisms. In these systems, the server's clients are typically portable computers, desktop computers, or electrical household appliances. The following are some scenarios in which information broadcast systems could be useful: In a shopping center, store directories could be broadcast, enabling customers with portable computers to view them to select which shops to enter. In a train station, up-to-date schedules could be broadcast. Passengers with portable computers could receive and store the information. At home, PCs could automatically receive digital data of various media (image, audio, text) through satellite communication, ground communication, or cable TV communication. The data could include news, online shopping, hit charts, commercial advertisements, sports, automatic software updates and video-on-demand. User-customizable machines would be capable of ltering the necessary information before storage. 5 4 Several strategies have been proposed to improve the performance of these information systems. These strategies are categorized into the following research elds:

1. Scheduling strategies at servers This research eld includes two notable research areas. One area concentrates on strategies to shorten the response time[1, 14, 16, 17, 22, 24, 25, 26, 29, 30]. If the global access probability diers for each data item, additional bandwidth should be allocated for data with a higher access probability. Hence, the broadcast period of the data becomes shorter, and therefore, the average response time is expected to be shorter. In relation to these strategies, a statistical estimation model of access frequencies in \push-based" systems has been proposed in [31]. The other area concentrates on strategies to save the power of mobile hosts[9, 19, 20, 21]. Since mobile hosts such as palmtop computers are not connected to a direct power source, power conservation is one of the most important issues in mobile computing environments. This research area aims to conserve the power of mobile hosts by indexing the broadcast program and hence reducing the monitoring time of broadcast data. 2. Caching strategies at clients This research eld aims to shorten the response by caching parts of the broadcast data at the client[1, 2, 15, 28]. 3. Combination of \push-based" and \pull-based" data delivery mechanisms Only data with high access probabilities from clients are broadcast, while the rest are transferred using a \pull-based" data delivery mechanism[5, 21]. This strategy shortens the broadcast periods for data with high access probabilities; therefore, its response time is expected to be shorter. Moreover, the response time is also expected to be shorter for data with low access probabilities since in a pure \push-based" strategy, their broadcast periods are very long. 4. Dissemination of Updates In an application such as a weather service, the information from a server often changes. In this case, updated information should be disseminated to clients at some periodic rate. Several strategies to disseminate information updates have been proposed [3, 5, 10, 13, 27]. Although correlation generally exists among broadcast data (e.g., clients request accesses to certain sets of data at the same time), these conventional strategies do not take it into account. When clients frequently access a set of correlated data, the scheduling and caching strategies which take the correlation into account can reduce the response time for data accesses. In this paper, we propose new scheduling and caching strategies which do just that. We then evaluate the performances of our proposed strategies. The following system environment is assumed: The system has a single server. Data is handled in clusters called data items. All data items are the same size. The server creates a broadcast program consisting of data items (ID: 1; 111;M). The program is repeatedly broadcast. Neither the data items nor the broadcast program is updated. All clients know the broadcast program precisely. This may be accomplished by broadcasting program information when the program is created. The access probabilities and the correlation of data items dier for each client, and each client knows its own access probabilities and the correlation. The remainder of the paper is organized as follows: In section 2, correlation among broadcast data items is described. New scheduling and caching strategies are proposed in section 3 and 4, respectively. Simulation results are shown in section 5. Finally, in section 6, we summarize this paper. 2 CORRELATION AMONG DATA ITEMS In a real environment, clients often access certain sets of data collectively. We dene the term \correlation" as the probability that the client sequentially accesses a certain set of data items. As the correlation increases, so does the probability that a particular set of data items will be accessed together. The correlation diers for each set of data items. A client may access a set of correlated data items in two ways. Firstly, the client submits multiple access requests at the same time. This includes the case where the intervals between successive requests are smaller enough than the broadcast period of one program cycle so that they can be ignored. For example, if a server broadcasts HTML les and images of various home pages separately as data items, a client often issues simultaneous requests for data items of an HTML le and image les that form a single page. Secondly, the client accesses a set of correlated data items by submitting access requests at some intervals. For example, if a server broadcasts various home pages as data items, a user (client) often refers to a certain page for a while and then refers to one of the pages linked from the rst page. Moreover, if binary les of various tools are broadcast, a user often uses a word processor for a while in order to produce a document and then uses a drawing tool in order to create some gures in the document. In this paper, we assume the rst way { i.e., the client submits multiple access requests at the same time. Moreover, for the purpose of simplicity, we assume that each client issues access requests for two correlated data items at the same time. In general, correlation among data items is complex so that a graph consisting of vertices representing individual data items and edges linking to correlated data items does not become a tree but a network topology. Thus, in order to reduce the response time to access sets of correlated data items, it seems wise to employ scheduling and caching strategies which consider this complex correlation. 3 SCHEDULING STRATEGIES FOR CORRELAT- ED DATA ITEMS In this section, rst we describe the conventional scheduling strategies. Then, we propose a new strategy that takes into account the correlation among data items. 3.1 Conventional Scheduling Strategies Generally, the probability that a data item will be accessed depends on the client. It has been reported that each client

M a D ij i j b W ab f Time Time Figure 2: A broadcast program. c d e a c e f b d Program issues data accesses to 20% of data items with 80% probability[1]. As for the access pattern of the global system, it is generally skewed. Thus, in the conventional scheduling strategies, the server frequently broadcasts data items with high access probabilities. In [1], it is mentioned that the broadcast order of data items does not inuence the average response time when the broadcast period of each item is properly chosen and remains unchanged in every broadcast cycle. An exception to this case occurs when the clients have caches and use prefetching-based caching strategies. However, in [1], the correlation among data items is not taken into account. Because it seems to be quite general that there is correlation among data items, scheduling strategies which take into account that correlation are expected to shorten the average response time for data accesses. 3.2 CBS (Correlation-Based Scheduling) Strategy In this subsection, we propose the CBS (Correlation-Based Scheduling) strategy, which takes into account the correlation among data items. For the purpose of simplicity, we assume that each data item is broadcast only once in a cycle of a program. In this case, the problem of scheduling a periodic broadcast program is equivalent to the problem of selecting a Hamiltonian circuit from a complete graph consisting of vertices representing each data item and ordering the data items along the circuit. Here, a Hamiltonian circuit means a circuit in which each vertex in the given graph appears exactly once. Let M denote the total number of data items that the server broadcasts in a program. Let L denote the time required to broadcast a single data item, and let D ij denote the number of data items that are broadcast between items i (j) and j (i) in a program (0 D ij M 0 2, and D ij = D ji) (see Figure 2). If a client does not have a cache, the average response time, avg ij, when a client requests two arbitrary items i and j at the same time is represented by the following equation: avg ij = Dij M 1fDij +(M 0 Dij 0 1)gL 2 + M 0 Dij 0 2 M + 2 M 1 (M 0 1 2 )L 1f M 0 Dij 0 2 2 +(D ij +1)gL = L M fdij(m 0 Dij 0 2)g + L(M 2 +2M 0 2) 2M The rst term of the center member in equation (1) represents the average response time to data access requests which are issued after i is broadcast and before j is broadcast in the cycle of the program. Similarly, the second term (1) Figure 3: An example of executing the CBS strategy. represents the average response time to requests which are issued after j is broadcast and before i is broadcast in the next cycle, and the third term represents the average response time to access requests which are issued while item i or j is being broadcast. The right-hand side of equation (1) shows that avg ij is a function of D ij and that the rst term has to be minimized in order to minimize avg ij. IfD ij is either 0 or M 0 2, avg ij becomes minimum, since D ij takes an integer value from 0 to M 02. In both the case where D ij is 0 and the case where D ij is M 0 2, the items i and j are broadcast side by side in the program. However, as the number of data items which have the correlation with other items increases, it becomes very dicult to compose the optimal program by deciding which data items should be broadcast side by side, since the number of items which can be broadcast side by side is limited. Thus, we propose the following heuristic strategy, called CBS strategy, which composes a program by putting data items with stronger correlation side by side in the broadcast program. The CBS strategy: 1. A complete graph consisting of vertices representing all data items and edges is created such that an edge between items i and j is weighted W ij =10P ij. Here, P ij denotes the probability P that the clients request two items i and j at the same time { i.e., the correlation M between i and j ( j=i+1p M Pij =1,and Pij = i=1 P ji ). 2. A Hamiltonian circuit which gives the minimum total weight is selected. This is done by solving the traveling salesman problem (TSP) on the given graph. 3. Along the selected circuit, all items are put in a broadcast program. Figure 3 shows an example of composing a program using the CBS strategy. First, the graph shown in the left part of Figure 3 is created for the items represented by the letters a to f. Then, a Hamiltonian circuit which gives the minimum total weight is selected. By putting the data items in the same order as the circuit, a broadcast program can be composed as in the right part of the gure. Because the CBS strategy only considers the strength of correlation among data items which are put side by side, the program constructed by this strategy is not always optimal. Even so, the CBS strategy seems to greatly shorten the average response time.

Though the CBS strategy is much easier than the problem of composing the optimal program, the second step of this strategy is equal to the TSP. Ifthenumber of broadcast data items increases, the CBS strategy cannot be practically solved, since the TSP is an NP-complete problem. Algorithms which can solve the TSP heuristically and quickly have already been proposed and used in various elds of engineering[7, 11, 23]. Thus, we use a heuristic TSP algorithm when the number of broadcast data items is large. 4 CACHING STRATEGIES FOR CORRELATED DA- TA ITEMS Since both the correlation among data items and the data items' access probabilities depend on each client, the average response time of some clients may be very large because only the global access probabilities and correlation were taken into consideration. In this case, if the client has a cache, a suitable caching strategy for each client can improve its performance. In this section, rst, we describe the PT strategy [4] which is a typical conventional caching strategy. Then, we propose a new strategy which extends the PT strategy to consider the correlation among data items. 4.1 Conventional Prefetching-Based Caching Strategy Several caching strategies have been proposed so far to improve the average response time of information systems which use a \push-based" data delivery mechanism. These strategies can be classied into two categories: prefetching-based and non-prefetching-based, which are distinguished by the time when each data is replaced. In prefetching-based strategies, data items which seem to be important are prefetched into a cache in advance. It is generally supposed that prefetching-based strategies are more suitable for \push-based" information systems, since these strategies can reduce the average response time almost without any additional cost. The PT strategy is a typical prefetching-based strategy, and it gives the best performance among the conventional strategies proposed so far. Here is a brief explanation of the PT strategy. The PT strategy: 1. Every time a new item is broadcast, a PT value is assigned to each item that is in the cache and is currently broadcast. Here, let P 0 i denote the probability that item i will be requested, T i denote the broadcast period of item i, i denote the time that has passed since item i has been cached, and i denote the time remaining until i is broadcast next. The PT value X i which is assigned to item i is expressed as follows: X i = P 0 i 1 T i 0 P 0 i 1 i (= P 0 i 1 i) (2) The PT value takes a maximum value P 0 i 1 T i upon insertion into the cache and takes a minimum value 0 when it is broadcast next. 2. X j, which is the PT value of the newly broadcast item j, is compared with the lowest PT value, X min, inthe cache. 3. If X j is larger than X min, the item with the lowest PT value is removed from the cache and j is inserted. Otherwise, no cache replacement occurs. Time b e f a b c d e f g h Currently Broadcasting Cache 1cycle a b... Figure 4: Item sets C and Q i for item e. The PT value, X i, represents the average response time when a client accesses item i if it is not stored in the cache. Therefore, for a given item set in the cache, the replacement based on the PT value is optimal at the moment when each item is broadcast. Although this replacement is not always optimal in the long term, this strategy gives the best performance out of all of the conventional strategies. C Q 4.2 CBPT (Correlation-Based PT) Strategy When the correlation among data items is considered, it is also eective to remove from the cache the item which requires the shortest response time. Based on this fact, we propose the following caching strategy, called the CBPT (Correlation-Based PT) strategy, which extends the PT strategy to consider the correlation among data items. The CBPT strategy: 1. Every time a new item is broadcast, a CBPT value is assigned to each item that is in the cache and is currently broadcast. Here, let i denote the time remaining until the item i is broadcast next and P ij denote the probability that items i and j will be requested at the same time { i.e., the correlation between i and j. Moreover, let C denote the set of items which are in the cache and Q i denote the set of data items which will be broadcast before item i is broadcast next and which are not contained in C. The CBPT value of item i, Y i, is represented by the following equation: Y i = i 1 X X P ik + ( i 0 k) 1 P ik (3) k2q i k2c 2. Y j, which is the CBPT value of the newly broadcast item j, is compared with the lowest CBPT value, Y min, in the cache. 3. If Y j is larger than Y min, the item with the lowest CBPT value is removed from the cache and j is inserted into the cache. Otherwise, no cache replacement occurs. Figure 4 shows the sets C and Q i for computing the CBPT value of the item e when the item a is currently being broadcast in the program consisting of data items represented by the letters a to h and when the items b, e, and f are in the cache. The CBPT value, Y i, represents the average response time to a client's simultaneous requests for two items, i and one other arbitrary item, assuming that the item i is not e

stored in the cache. Therefore, for the given item set in the cache, the replacement based on the CBPT strategy is optimal at the moment when each item is broadcast. Similar to the PT strategy, although the replacement based on the CBPT strategy is not always optimal in the long term, this strategy seems to give a good performance for the same reasons. If we let P ii denote the probability that a client requests only one item i in equation (3), the CBPT strategy can still be used even if there is no correlation among the data items. The conventional PT strategy is a special case of the CBPT strategy where there is no correlation among data items. 5 PERFORMANCE EVALUATION In this section, we show the performance results from simulations of the CBS and CBPT strategies. The following assumptions were made for the simulation experiments: There exist 120 data items (item identiers: 1, 111, 120) which are each broadcast once according to the broadcast program. It takes 10 units of time to broadcast one item in the program. Although this value proportionally aects the response time, it has relatively little eect on the simulation results, so we choose this value arbitrarily. The probability that a client issues simultaneous requests for two items is 0.1 at each unit time. This value does not aect the response time, since the system's access frequency does not aect our proposed scheduling and caching strategies. Hence, this value is also chosen arbitrarily. The probability that two items i and j (i =1; 111; 120, j = 1; 111; 120) will be requested at the same time is expressed by an i-j element, P ij (P ij = P ji), in the access probability matrix (a 120 dimensional matrix). For the purpose of simplicity, it is assumed that P ii = 0 for each item i (i =1; 111; 120). The remaining elements are assigned either 0 or some positive value. In the simulations, the following two access probability matrices are used. MATRIX 1: All elements except for the diagonal ones (P ii, i =1; 111; 120) are assigned 0 or a constant value. The percentage of the elements which are not assigned 0 is a variable in the simulations in section 5.1.1 and section 5.2. As the percentage of the elements which are not assigned 0 gets lower, the correlation between each two items becomes stronger. MATRIX 2: The data items are divided into several groups. The elements for every combination of two items in a group are assigned a positive value, and the rest of the elements are assigned 0. This means that the correlation exists only among data items which are in a same group. The positive value assigned to each element is selected randomly from three values: `large', `medium', and `small'. The ratio `large':`medium':`small' is 8:5:2. The number of groups is a variable in the simulation in section 5.1.2. As the number of groups gets larger, the correlation range becomes smaller and each correlation becomes more conspicuous. Average Response Time 440 420 400 380 360 340 320 Figure 5: 300 20 30 40 50 60 70 80 90 100 Percent (%) CBS SCATTER Average response time of the CBS strategy (Simulation 1). 5.1 Evaluation of the CBS Strategy In this subsection, we evaluate the interval between the responses to two access requests which are issued at the same time. Evaluating the interval between two responses is suf- cient because the response time to the rst access does not depend on the employed scheduling strategy. Here, we call this interval the `response time' for simplicity. For comparison, we also evaluate the average response time of the random scheduling strategy which does not consider the correlation among data items. 5.1.1 Simulation 1 The average response times of the two strategies, the CBS strategy and the random scheduling strategy, using MA- TRIX 1 are compared, varying the ratio (%) of elements which are not 0 in the matrix. Figure 5 shows the result of this simulation. The horizontal axis indicates the percentage of elements which are not 0. In the graph, the random scheduling strategy is shown as `SCATTER'. The result shows that the CBS strategy gives shorter response times than the random scheduling strategy. Moreover, it also shows that the dierence in response time between two strategies increases as the percentage of elements which are not 0 gets smaller { i.e., as the correlation gets stronger. 5.1.2 Simulation 2 The average response times of the two strategies using MA- TRIX 2 is compared, varying the number of the item groups. Figure 6 shows the result of this simulation. The horizontal axis indicates the number of groups. The result shows that the CBS strategy gives shorter response times than the random scheduling strategy. Moreover, it also shows that the dierence in the performance between two strategies becomes larger as the number of groups becomes larger { i.e., as the correlation among data items becomes more conspicuous.

450 400 CBS SCATTER 500 CBPT PT 350 480 Average Response Time 300 250 200 150 Average Response Time 460 440 420 100 50 1 2 3 4 5 6 Number of Groups 400 20 30 40 50 60 70 80 90 100 Percent(%) Figure 6: Average response time of the CBS strategy (Simulation 2). Figure 7: Average response time of the CBPT strategy. 5.2 Evaluation of the CBPT Strategy In this subsection, the average response time of our proposed CBPT strategy is evaluated in the environment where the client has a cache and the size of the cache is 40 percent of the size of one program cycle. Figure 7 shows the simulation result using MATRIX 1, varying the ratio of the elements which are not P 0. For comparison, the average response time of the PT strategy is also shown, assigning each item to the value Pi 0 M = Pij as the access probability in order to j=1 compute the PT value. The horizontal axis indicates the percentage of the elements which are not 0. The result shows that the CBPT strategy gives better performance than the PT strategy, which is considered to give the best performance among the conventional strategies. In particular, the dierence in performance between the two strategies becomes larger as the correlation among data items gets stronger. Although in this paper the access requests for two items are issued at the same time, the CBPT strategy is expected to give even better performance when the access requests for more than two items are issued at the same time. This is because the CBPT strategy replaces the items in a cache based on the correlation among data items as well as the items' access probabilities. 6 CONCLUSION In this paper, we considered \push-based" information systems in which each client issues access requests for two correlated data items at the same time, and discussed the scheduling and the caching of broadcast data items that gives careful consideration to the correlation among data items. First, we proposed a scheduling strategy which puts data items with strong correlation side by side in the broadcast program in order to reduce the average response time. Then, we proposed a caching strategy which extends the conventional PT strategy so that it can take advantage of correlation among broadcast data for greater eciency. We also showed simulation results which we did to evaluate the eectiveness of our proposed strategies. The sim- ulation results show that correlation among data items is a signicant factor in system performance. As a part of our future work, we are planning to extend the CBS strategy to consider the broadcast frequencies of data items, since in the conventional scheduling strategies, the server frequently broadcasts data items which have high access probabilities. So far, we have assumed an environment in which each client issues access requests for two data items at the same time. However, in a real environment, there are also situations in which the client accesses a set of correlated data items with some intervals. We are now studying the scheduling and caching strategies for such cases. REFERENCES [1] Acharya, S., Alonso, R., Franklin, M., and Zdonik, S.: Broadcast Disks: Data Management for Asymmetric Communication Environments, Proc. ACM SIGMOD Conference, pp.199{210 (1995) [2] Acharya, S., Franklin, M., and Zdonik, S.: Dissemination-Based Data Delivery Using Broadcast Disks, IEEE Personal Communications, Vol.2, No.6, pp.50{60 (1995) [3] Acharya, S., Franklin, M., and Zdonik, S.: Disseminating Updates on Broadcast Disks, Proc. Int'l Conf. on Very Large Data Bases (VLDB'96), pp.354{365 (1996) [4] Acharya, S., Franklin, M., and Zdonik, S.: Prefetching from a Broadcast Disks, Proc. Int'l Conf. on Data Engineering (ICDE'96), pp.276{285 (1996) [5] Acharya, S., Franklin, M., and Zdonik, S.: Balancing Push and Pull for Data Broadcast, Proc. ACM SIG- MOD Conference, pp.183{194 (1997) [6] Ammar, M.H. and Wong, J.W.: On the Optimality of Cyclic Transmissions in Teletext Systems, IEEE Transactions on Communications, Vol.35, No.1, pp.68{ 73 (1987)

[7] Arora, S.: Polynomial Time Approximation Schemes for Euclidean Traveling Salesman and Other Geometric Problems, Journal of the ACM, Vol.45, No.5, pp.753{ 782 (1998) [8] Bowen, T.F., Gopal, G., Herman, G., Hickey, T., Lee, K.C., Manseld, W.H., Raitz, J., and Wienrib, A.: The Datacycle Architecture, Communications of the ACM, Vol.35, No.12, pp.71{81 (1992) [9] Chen, M.S., Yu, P.S., and Wu., K.L.: Indexed Sequential Data Broadcasting in Wireless Mobile Computing, Proc. Int'l Conf. on Distributed Computing Systems (ICDCS'97), pp.124{131 (1997) [10] Dao, S. and Perry, B.: Information Dissemination in Hybrid Satellite/Terrestrial Networks, IEEE Data Engineering Bulletin, Vol.19, No.3, pp.12{18 (1996) [11] Das, D., Kapoor, S., and Smid, M.: On the Complexity of Approximating Euclidean Traveling Salesman Tours and Minimum Spanning Trees, Algorithmica, Vol.19, pp.447{460 (1997) [12] David, K.: Polychannel Systems for Mass Digital Communication, Communications of the ACM, Vol.33, No.2, pp.141{151 (1990) [13] Franklin, M. and Zdonik, S.: Dissemination-Based Information Systems, Proc. IEEE Data Engineering Bulletin, Vol.19, No.3, pp.20{30 (1996) [14] Gondhalekar, V., Jain, R., and Werth, J.: Scheduling on Airdisks: Ecient Access to Personalized Information Services via Periodic Wireless Data Broadcast, Technical Report CS-TR-96-25, Univ. Texas at Austin, Dept. of Comp. Sci. (1996) [15] Grassi, V.: Prefetching Policies for Energy Saving and Latency Reduction in a Wireless Broadcast Data Delivery System, Proc. ACM Int'l Workshop on Modeling, Analysis and Simulation of Wireless and Mobile Systems (ACM MSWIS 2000), pp.77{84 (2000) [16] Hameed, S. and Vaidya, N.H.: Log-time Algorithms for Scheduling Single and Multiple Channel Data Broadcast, Proc. Int'l Conf. on Mobile Computing and Networking (MOBICOM 97), pp.90{99 (1997) [17] Hameed, S. and Vaidya, N.H.: Ecient Algorithm for Scheduling Data Broadcast, ACM-Baltzer Journal of Wireless Networks, Vol.5, No.3, pp.183{193 (1999) [18] Herman, G., Gopal, G., Lee, K., and Weinrib, A.: The Datacycle Architecture for Very High Throughput Database Systems, Proc. ACM SIGMOD Conference, pp.97{103 (1987) [19] Imielinski, T., Viswanathan, S., and Badrinath, B.R.: Energy Ecient Indexing On Air, Proc. ACM SIG- MOD Conference, pp.25{36 (1994) [20] Imielinski, T. and Badrinath, B.R.: Mobile Wireless Computing, Communications of the ACM, Vol.37, No.10, pp.20{28 (1994) [21] Imielinski, T., Viswanathan, S., and Badrinath, B.R.: Data on Air: Organization and Access, IEEE Transaction on Knowledge and Data Engineering, Vol.9, No.3, pp.353{372 (1997) [22] Jain, R. and Werth, J.: Airdisks and AirRAID : Modeling and Scheduling Periodic Wireless Data Broadcast (Extended Abstract), DIMACS Tech. Report 95-11, Rutgers University (1995) [23] Katayama, K. and Narihisa, H.: A New Iterated Local Search Algorithm Using Genetic Crossover for the Traveling Salesman Problem, Proc. ACM Symposium on Applied Computing (ACM SAC'99), pp.302{306 (1999) [24] Lin, L. and Xingming, Z.: Heuristic MultiDisk Scheduling for Data Broadcasting, Proc. Int'l Workshop on Satellite-Based Information Services (WOSBIS'97), pp.1{5 (1997) [25] Peng, W.C. and Chen, M.S.: Dynamic Generation of Data Broadcasting Programs for a Broadcast Disk Array in a Mobile Computing Environment, Proc. ACM Int'l Conf. on Information Knowledge Management (ACM CIKM 2000), pp.38{45 (2000) [26] Prabhakara, K., Hua, K.A., and Oh, J.H.: Multi-Level Multi-Channel Air Cache Designs for Broadcasting in a Mobile Environment, Proc. Int'l Conf. on Data Engineering (ICDE 2000), pp.167{176 (2000) [27] Stathatos, K., Roussopoulos, N., and Baras, J.: Adaptive Data Broadcast in Hybrid Networks, Proc. Int'l Conf. on Very Large Data Bases (VLDB'97), pp.326{ 335 (1997) [28] Su, C.J. and Tassiulas, L.: Joint Broadcast Scheduling and User's Cache Management for Ecient Information Delivery, Proc. Int'l Conf. on Mobile Computing and Networking (MOBICOM'98), pp.33{42 (1998) [29] Su, C.J., Tassiulas, L., and Tsotras, V.J.: Broadcast Scheduling for Information Distribution, ACM-Baltzer Journal of Wireless Networks, Vol.5, No.2, pp.137{147 (1999) [30] Vaidya, N.H. and Hameed, S.: Improved Algorithms for Scheduling Data Broadcast, Tech. Report 96-029, Comp. Sc. Dept., Texas A&M University (1997) [31] Yu, J.X., Sakata, T., and Tan, K.L.: Statistical Estimation of Access Frequencies in Data Broadcasting Environments, ACM-Baltzer Journal of Wireless Networks, Vol.6, No.2, pp.89{98 (2000)