Allocation of copies of a file in an information network

Similar documents
Ones Assignment Method for Solving Traveling Salesman Problem

1 Graph Sparsfication

Pattern Recognition Systems Lab 1 Least Mean Squares

. Written in factored form it is easy to see that the roots are 2, 2, i,

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

6.854J / J Advanced Algorithms Fall 2008

condition w i B i S maximum u i

Big-O Analysis. Asymptotics

The isoperimetric problem on the hypercube

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

The Adjacency Matrix and The nth Eigenvalue

Lecture 18. Optimization in n dimensions

Alpha Individual Solutions MAΘ National Convention 2013

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8)

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

3D Model Retrieval Method Based on Sample Prediction

n n B. How many subsets of C are there of cardinality n. We are selecting elements for such a

Big-O Analysis. Asymptotics

Image Segmentation EEE 508

Solving Fuzzy Assignment Problem Using Fourier Elimination Method

Σ P(i) ( depth T (K i ) + 1),

Lecture 1: Introduction and Strassen s Algorithm

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov

Lecture 5. Counting Sort / Radix Sort

Combination Labelings Of Graphs

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

Chapter 3 Classification of FFT Processor Algorithms

Consider the following population data for the state of California. Year Population

Elementary Educational Computer

arxiv: v2 [cs.ds] 24 Mar 2018

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Algorithms for Disk Covering Problems with the Most Points

An Algorithm to Solve Fuzzy Trapezoidal Transshipment Problem

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Chapter 11. Friends, Overloaded Operators, and Arrays in Classes. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

c-dominating Sets for Families of Graphs

Python Programming: An Introduction to Computer Science

Lecture 28: Data Link Layer

Lecture 2: Spectra of Graphs

Lecturers: Sanjam Garg and Prasad Raghavendra Feb 21, Midterm 1 Solutions

On Infinite Groups that are Isomorphic to its Proper Infinite Subgroup. Jaymar Talledo Balihon. Abstract

An Efficient Algorithm for Graph Bisection of Triangularizations

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein

Fundamentals of Media Processing. Shin'ichi Satoh Kazuya Kodama Hiroshi Mo Duy-Dinh Le

Accuracy Improvement in Camera Calibration

Adaptive Resource Allocation for Electric Environmental Pollution through the Control Network

Load balanced Parallel Prime Number Generator with Sieve of Eratosthenes on Cluster Computers *

On Nonblocking Folded-Clos Networks in Computer Communication Environments

Throughput-Delay Scaling in Wireless Networks with Constant-Size Packets

Data Structures and Algorithms. Analysis of Algorithms

On (K t e)-saturated Graphs

Protected points in ordered trees

Counting the Number of Minimum Roman Dominating Functions of a Graph

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

New Results on Energy of Graphs of Small Order

the beginning of the program in order for it to work correctly. Similarly, a Confirm

An Efficient Algorithm for Graph Bisection of Triangularizations

1.2 Binomial Coefficients and Subsets

A Polynomial Interval Shortest-Route Algorithm for Acyclic Network

BOOLEAN MATHEMATICS: GENERAL THEORY

Computers and Scientific Thinking

CS 683: Advanced Design and Analysis of Algorithms

Reliable Transmission. Spring 2018 CS 438 Staff - University of Illinois 1

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve

1. SWITCHING FUNDAMENTALS

Python Programming: An Introduction to Computer Science

Polynomial Functions and Models. Learning Objectives. Polynomials. P (x) = a n x n + a n 1 x n a 1 x + a 0, a n 0

Graphs. Minimum Spanning Trees. Slides by Rose Hoberman (CMU)

Lower Bounds for Sorting

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

Bezier curves. Figure 2 shows cubic Bezier curves for various control points. In a Bezier curve, only

How do we evaluate algorithms?

Fast Fourier Transform (FFT) Algorithms

Redundancy Allocation for Series Parallel Systems with Multiple Constraints and Sensitivity Analysis

Examples and Applications of Binary Search

MATHEMATICAL METHODS OF ANALYSIS AND EXPERIMENTAL DATA PROCESSING (Or Methods of Curve Fitting)

Recursive Procedures. How can you model the relationship between consecutive terms of a sequence?

Thompson s Group F (p + 1) is not Minimally Almost Convex

Optimum Solution of Quadratic Programming Problem: By Wolfe s Modified Simplex Method

Exact Minimum Lower Bound Algorithm for Traveling Salesman Problem

Evaluation scheme for Tracking in AMI

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today

The Magma Database file formats

performance to the performance they can experience when they use the services from a xed location.

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs

IMP: Superposer Integrated Morphometrics Package Superposition Tool

Running Time. Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Ch 9.3 Geometric Sequences and Series Lessons

Chapter 8. Strings and Vectors. Copyright 2014 Pearson Addison-Wesley. All rights reserved.

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies. Limitations of Experiments

Analysis of Algorithms

Perhaps the method will give that for every e > U f() > p - 3/+e There is o o-trivial upper boud for f() ad ot eve f() < Z - e. seems to be kow, where

Improved Random Graph Isomorphism

CS200: Hash Tables. Prichard Ch CS200 - Hash Tables 1

Massachusetts Institute of Technology Lecture : Theory of Parallel Systems Feb. 25, Lecture 6: List contraction, tree contraction, and

Lecture Notes on Integer Linear Programming

The golden search method: Question 1

Transcription:

Allocatio of copies of a file i a iformatio etwork by R. G. CASEY IBM Research Laboratory Sa Jose, Califoria INTRODUCTION We cosider a mathematical model of a iformatio etwork of odes, some of which cotai copies of a give data file. Withi this etwork, every ode is able to commuicate with every other ode over commuicatio liks (a process which may etail routig through itermediate odes). I particular, we are cocered with trasactios with the multiply-located file. Such trasactios fall ito oe of two classes: (1) query traffic betwee a ode ad the file, ad (2) update traffic. A update message is assumed to be trasmitted to every copy of the file, whereas a query is commuicated oly to a sigle copy. We proceed to demostrate, withi a simple liear cost model for the etwork, several properties of the optimal assigmet of files to odes. Oe set of results expresses bouds o the umber of copies of the file that should be icluded i the etwork, as a fuctio of the relative volume of query ad update traffic. Secodly, a test useful i determiig the optimum cofiguratio is derived. The problem of allocatig resources i a etwork first arose withi the cotext of determiig the most ecoomical locatio for maufacturig plats ad warehouses. 1,2,3 I some studies, the model of the etwork is made i such a way that a efficiet algorithm yieldig the optimal allocatio is ot readily obtaiable, ad heuristic or approximate techiques are tried istead. Frazer,4 for example, has developed a method for allocatig productio facilities whe the cost of deployig a resource at a give ode cosists of a fixed overhead expediture plus a amout proportioal to the total demad o the facility. These maufacturig models are coceptually differet from the file allocatio problem, whe updatig traffic is cosidered i the latter. Sice all copies of a file are modified by each update message, the total volume of data trasmitted i the etwork is ot idepedet of the allocatio policy, as is the total volume of goods shipped i a maufacturig eviromet, but rather icreases with the umber of file copies allocated. Yet, as will be show, the models are mathematically equivalet to the file model aalyzed here. Chu 5 has ivestigated a liear programmig solutio to the file allocatio problem. His model icludes storage costs ad queuig delays, but the umber of copies of each file i the system is assumed to be kow. Whitey6 has formulated a similar model, ad applied it to the task of desigig etwork topology, as well as that of allocatig copies of the file. THE MODEL The fixed cost (maily for storage) of locatig a copy of the file at the kth ode (k = 1, 2,..., ) will be deoted by a amout Uk, measured i, say, dollars per moth. The symbols Aj ad tf;j will be used to represet the volume of query traffic ad of update traffic, respectively, emaatig from ode j(j=l, 2,..., ). The quatities djk ad d jk' are the costs of a uit of commuicatio from ode j to ode k for a query ad a update trasactio, respectively. Thus, djk ad djk' might be measured i dollars per megabit trasmitted, ad Aj ad tf;j expressed i megabits per moth. The possibility of a differece i cost rates is icluded i the model o the premise that i may applicatios updates ca be accumulated ad trasmitted either via a cheaper medium (e.g., by mailig magetic tapes), or at a time whe commuicatios rates are lower (e.g., at ight usig switched lies). * Queries are assumed to be etered o-lie, at a higher commuicatio rate. If the kth ode cotais a copy of the file, the commuicatios from the jth ode produce a cost term of Ajdjk for query trasactios ad a amout tf;jdj/ for update * The author is idebted to Dr. W. D. Frazer of IBM Research for poitig out this geeralizatio. 617 From the collectio of the Computer History Museum (www.computerhistory.org)

618 Sprig Joit Computer Coferece, 1972 trasactios. Actually, i much of the discussio that follows, ad i the experimets carried out, we assume equal cost rates for iquiry ad update. I practice, the cost of shippig iformatio may ot be liear with the amout set; the assumptio is oetheless of theoretical iterest i order to obtai a first-order approximatio to a rather complex mootoically icreasig fuctio. Weare cocered with the problem of determiig at which odes of the etwork copies of the file shall reside. We shall assume that this allocatio is to be doe i such a way as to miimize the total cost of commuicatio betwee users ad files. I geeral, the cost of queryig is reduced as we icrease the umber of file odes i the etwork (users ear a ew file ode fid their queryig more ecoomical; the other users are o worse off). O the other had, storage costs ad the cost of updatig go up as ew copies of the file are itroduced, sice every copy must be updated. I the limitig cases, if o updatig is doe, ad if storage costs are low, a copy of the file should be kept at every ode (ad etwork commuicatio ot used), whereas if oly updatig is doe, ad o queryig, a sigle copy of the file should be maitaied at some optimal ode of the etwork. By a assigmet of the file we shall mea a choice of odes at which to locate the file. Let I deote a set of ode idexes represetig a give assigmet. The total commuicatio cost resultig from this choice of file locatios is a sum over idividual user odes. I the case of query traffic, we assume that the user accesses that copy of the file which miimizes his commuicatio cost. His updates, of course, must be trasmitted to all the file odes. The geeral expressio for total cost is therefore: C(I)= L[L~jdjk'+AjmidjkJ+ LUk 1=1 kei kei kei The problem of file allocatio i this cotext is to choos_e the idex set, I, so as to miimize C (I). This cost fuctio ca be writte i the form where C (I) = L Uk+ 2: Gj(I) kei 1=1 Uk=Uk+ L ~jdjk' 1=1 Gj(I) = Aj mi djk kei With C (I) expressed i this form the problem of miimizig cost is see to be exactly the problem ivestigated by Efroymso ad Ray,! Feldma, et al.,2 Frazer,4 ad others. The update costs of the file model are aalogous to the fixed costs of the plat locatio model, while query costs are aalogous to the trasportatio costs. Previous researchers tried mixed iteger-liear programmi,g techiques for obtaiig solutios to the miimizatio problem. Geerally, the true miimum of C (1) is computatioally very expesive to obtai by such methods, ad heuristics are developed that sacrifice optimality i order to arrive at "good" solutios quickly. Here, we take a alterative approach. We shall first exhibit several mathematical properties of the cost fuctio. We shall use these to costruct a search procedure which is guarateed to produce the miimum of C (1). Fially, we suggest heuristic modificatios to the geeral agorithm for use i cases where it is too expesive to be applied. Thus, both attempts to solve the problem, by meas of iteger programmig ad by exploitig the properties of the cost fuctio, tur to ad hoc adjustmets i order to obtai relief from the expese of guarateeig a optimum solutio. It would be of iterest to compare the performace of the two approaches o the same problems. The iitial examiatio of the properties of C (I) yields bouds o the umber of elemets i the optimal file ode set I, whe storage costs do ot vary. This will be show by meas of the followig theorem. Let the cost of update commuicatio be equal to the cost of query commuicatio (d jk = djk'). Theorem 1: If for some iteger r~ ~j>aj/(r-l) for each j = 1, 2,..., the ay r-ode file assigmet is more costly tha the optimal oe-ode assigmet. We prove the theorem by meas of the followig lemma: Lemma: If ~j= PAj for j = 1,2,..., the, a r-ode assigmet caot be less costly tha the optimal oe-ode assigmet if p?l/ (r-l). Proof of Lemma: Cosider a arbitrary assigmet 1= {I, 2,..., r}. Let the elemets of I be ordered such that ode 1 is the lowest cost sigle-ode assigmet. We have r r C(I) =PL LAjdjk+ L mi Ajdjk+ L Uk k=1 1=1 1=1 kei k=1 From the collectio of the Computer History Museum (www.computerhistory.org)

Allocatio of Copies of a File i a Iformatio Network 619 ad C( {k}) = (1+p) L Ajdjk+Uk k= 1,2,..., r i=1 By the optimality of ode 1 there are oegative umbers a2,..., at such that Substitutig the above, we ca write k=2,3,...,r. C(/) -C( {I}) = [pr-p-1] L Ajdj1+ L pak i=1 k=2 + ~. "\ d + p(r-1)ul... mi I\j jk i=1 k l+p 1 r +-LUk l+p k=2 which is certaily oegative if (pr- p-1) is oegative. That is, p~ 1/ (r-1) implies that C (1) ~ C({l}). If the optimal sigle ode assigmet is i I, the it is ode 1 ad the lemma is proved. If the optimal ode, say k', is ot i I, the we have p~ 1/ (r-1) implies C(/) ~C( {I}) >C({k'}) ad so the lemma is true i this case as well. Proof of Theorem 1 The lemma cocers the case where each user geerates the same proportio of update traffic to query traffic. Suppose ow that the proportios vary from ode to ode, but always exceed a give amout p. The, there are oegative quatities Ej such that j=1,2,..., The cost fuctios ca ow be writte: C'( {I}) = L (1+P+Ej)Ajd J1 +u=c( {I}) +L EjAjdjl i=1 i=1 C' (/) = L (P+Ej) Aj L djk i=1 k=1 + L mi (P+Ej) djk+rou i=1 k T r = C (/) + L EjAj L djk+ L Ej mi djk i=1 k=1 i=1 k where the uprimed costs are those give i the lemma. T But clearly C' (/) ~ C' ( {I}) wheever C (/) ~ C ( {I} ). The applyig the lemma, the coditio p~ 1/ (r-1) is sufficiet to esure that the r-ode assigmet is ot optimal. I additio, if at least oe of the E/S is greater tha zero, we must have the strict iequality C (/) > C({I}). Corollary 1 If the update/query traffic ratio satisfies p~ 1/ (r-l) (r a iteger) the the optimal allocatio cosists of o more tha r odes. Proof If 1/1 j ~ Ail (r -1) for each j the certaily for ay iteger l ad so theorem 1 rules out the optimality of a (r+l) ode assigmet. Corollary 2 If each user geerates at least 50 percet of his traffic i the form of updates, the the optimal assigmet policy is to locate oly a sigle copy of the file i the etwork. The corollary follows directly from the theorem by settig r = 2, but is worth statig separately sice it sets a boud beyod which multiple copies of the file should ot be cosidered. Furthermore, it is easy to show that for equal proportios of update ad query traffic the two-ode assigmet is o more costly tha a oe-ode assigmet of the file oly if storage costs are eglected, ad the rows ad colums of the cost matrix ca be permuted to yield the form: where [:. 0... 0 al a2... am 0 0... ] b2... b l 0 0... 0 0 0... 0 (rest of matrix) m m L aiaa = L biab, i=1 i=1 Aa ad Ab beig the respective traffic volumes for the first two rows (query or update). From the collectio of the Computer History Museum (www.computerhistory.org)

620 Sprig Joit Computer Coferece, 1972 A PROPERTY OF THE OPTIMAL FILE ALLOCATION I this sectio, we examie the behavior of the cost fuctio as additioal file odes are added. I particular, we cosider a graph such as Figure 1, where each vertex is idetified with a file assigmet (deoted by a biary vector havig l's i those positios correspodig to file odes, O's elsewhere). Associated with each vertex is the correspodig value of the cost fuctio (ot show i Figure 1). For mathematical coveiece the ull vertex is assiged ifiite cost. The edges of the graph are directed paths correspodig to the additio of a sigle file ode to the previous assigmet. The graph is coveietly arraged i levels where the vertices at the kth level represet all k-ode file assigmets. We demostrate the sigificat property of this graph (which we shall call the "cost" graph) that if a give vertex has a cost less tha the cost of ay vertex alog the paths leadig to it, the the sequece of costs ecoutered alog ayoe of these paths decreases mootoically. Thus, i order to fid the optimal allocatio policy, it is sufficiet to follow every path of the cost graph util the cost icreases, ad o further. Because of this property, oly a small subset of the possible assigmets eed be tested, compared with the exhaustive search procedure i which 2 differet file allocatios must be evaluated. The mootoicity will be exhibited i two stages: first, for the case of two steps through the graph, ad the for the geeral case. Let I'""X, where XCI, deote the idex set I with the elemets of set X removed. Cosider a arbitrary file assigmet correspodig to idex set I, ad assume the odes are so umbered that 1= [1,2,...,r]. We proceed to show: Lemma: If C(I) ~C(I'""{k}) for k= 1,2, the C (I'"" {k} ~ C(I'""{l, 2}) fork=l, 2. Proof of Lemma: The cost fuctio ca be writte, i geeral, Thus where Also where C(I'""X) = L: Uk+ L: Aj mi djk kei,...,x i=i kei,...,x C(I) -C(I'""{l}) = U 1 - dj = mi d jk - mi d jk kei,...,{ I} kei C(I,",,{2}) -C(I'""{l, 2}) = U 1 - Cosider the differece L: Ajdj i=i d/ = mi d jk - mi d jk kei,...,{ I,2} kei,...,{ 2} L Ajd/ i=i d/-dj= mi d jk+ midjk - mi d jk - mi djk kei,...,{ I,2} kei kei,...,{ 2} kei,...,{ I} We have Therefore ad so thus, mi d jk= mie mi d jk, mi d jk) kei kei,...,{ I} kei,...,{ 2} mi djk~ max (mi d jk, mi d jk) kei,...,{ I,2} kei,...,{ I} kei,...,{ 2} d/-dj~o, that is d/~dh C (I'"" { 2 } ) - C (I'"" { 1, 2} ) ~ C (I) - C (I'" { 1 } ) implies that / Figure 1-Graph of the allocatio process By a mere permutatio of idexes we may likewise prove that From the collectio of the Computer History Museum (www.computerhistory.org)

Allocatio of Copies of a File i a Iformatio Network 621 implies that C(lrv{l}) :::;C(1rv{l, 2}) ad so the lemma is true. By meas of this lemma, we ca ow prove the geeral theorem: Theorem 2: Give a idex set X CI cotaiig r elemets, ad havig the property C(l) :::;C(lrv{x}) for each xex the for every sequece RCI), R ( 2),...,RCT) of subsets of X such that RCk) has k elemets, ad RCk)CRCHI) it is true that C(l) :::;C(lrvR(l) :::;C(lrvRC2) Proof: (by iductio) :::;C(lrvRC3):::; :::;C(1rvRCT) The first iequality above is give i the hypothesis. Sice R(2) has two elemets the secod iequality follows from the lemma. I t may thus be take as hypothesis to prove the third, ad so o. Each iequality proved ca be used together with the lemma to prove the iequality to its right. Observatios: If we take X = I i the theorem, the it states that alog ay path i the cost graph from the ull vertex (which we have assiged ifiite cost) to the vertex correspodig to I, cost decreases mootoically from oe vertex to the ext, providig of course that idex set I satisfies the hypothesis of the theorem. The mootoic decreasig property ca also be show to hold i the reverse directio; i.e., for paths from the vertex {1, 2,...,} to the optimum.* APPLICATION TO FILE ALLOCATION The theorem just proved provides a test that is useful i determiig the miimum cost allocatio of file copies to odes of the etwork. Referrig to the cost graph (Figure 1), let us defie a '''atecedet'' of a arbitrary vertex v as a vertex which has a coectio to v from the ext lower level, ad a "successor" of v as a vertex coected to v at the ext higher level. A * A reviewer poited out this geeralizatio. atecedet cotais the file odes of v less oe ode, while each successor cotais the odes of v plus oe additioal ode. A vertex at the rth level of the graph has r atecedets, ad (-r) successors, where is the umber of odes i the etwork. (Note that, for clarity, we use the term "ode" i referrig to the computer etwork, ad the term "vertex" with respect to the cost graph.) We shall also defie a "local optimum" of the cost graph as a vertex which is less costly tha all of its atecedets ad successors. Clearly, the global optimum we seek belogs to the set of local optima of the graph. The theorem permits us to discover all the local optima without computig the cost of every vertex, for each path leadig from the 0 level to a local optimum must give rise to a mootoically decreasig sequece cj costs. Wheever a icrease is ecoutered i a step forward through the graph, that path ca be abadoed sice it caot lead to a local (or global) optimum. A "path-tracig" routie which evaluates each possible sequece of ode additios i this way is certai to produce the miimum value of C (1). The amout of computatio eeded to extract the miimum will icrease with the umber of local optima, ad with the umber of odes i the optimal cofiguratio. A computer algorithm ca be implemeted i several differet ways to select file odes oe at a time up to the optimum cofiguratio. Oe approach is to follow all paths i parallel through the cost graph, steppig INPUT PARAMETERS COST PER MEGABYTE SHIPPED QUERY UPDATE FILE NODES TRAFFIC TRAFFIC 1 2 3 4 5 U 1 24 2 0 6 12 9 6 S 2 24 3 6 0 6 12 9 E 3 24 4 12 6 0 6 12 R 4 24 6 9 12 6 0 6 S 5 24 8 6 9 12 6 0 QUERYING COSTS 1 2 3 4 5 1 0 144 288 216 144 2 144 0 144 288 216 3 288 144 0 144 288 4 216 288 144 0 144 5 144 216 288 144 0 UPDATE COSTS 1 2 3 4 5 1 0 12 24 18 12 2 18 0 18 36 27 3 48 24 0 24 48 4 54 72 36 0 36 5 48 72 96 48 0 Figure 2-A five-ode example From the collectio of the Computer History Museum (www.computerhistory.org)

622 Sprig Joit,; Computer Coferece, 1972 "%\ Figure 3-The cost graph for the example. J.Jocal optima are circled oe level per iteratio. This method is computatioally efficiet, but may require a great deal of storage i a large problem. Alteratively, a program ca be writte to trace systematically oe path at a time. Such a techique uses less storage, but may require redudat calculatios sice may differet paths itersect each vertex. EXAMPLES We cosider a five-ode etwork with query ad update parameters, ad cost matrix as i Figure 2. Figure 3 shows the file ode cost graph, ad idicates the optimal allocatios. This small-scale example illustrates the mootoicity of the cost fuctio alog paths leadig to a local optimum. Note that the matrix {d jk } is symmetric ad has zero elemets alog the mai diagoal. This is a plausible coditio i a practical case, but is ot required i ay way by the theorems prove here. As a secod example, we postulate the ARP A computer etwork ad its traffic matrix as give by Kleirock. 7 I order to treat this case, which is ot iitially posed as a cetralized data base cofiguratio, we make the followig rather arbitrary assumptios: (1) All the resources of the etwork represeted i the traffic matrix are gathered ito oe large file, copies of which are to be allocated to a subset of the odes. (2) The query traffic (Figure 4) from each user to the file is as give uder the "ode output" colum i Kleirock's table. If the user is accessig a program, we may thik of this "query" traffic as comprisig the commad messages he must sed i order to ru the program, plus the program output set back to him. (Note: Kleirock does ot give figures for each ode's use of its ow facilities.) (3) "Update" traffic, which causes modificatio of programs ad data i all copies of the file, is a fixed percetage of the query traffic defied i (2). The update/query ratio is the same for each user, ad is varied as a parameter i the computer rus described below. (4) The "cost" of commuicatig betwee two etwork odes is as show i Figure 4. The quatities give here are roughly the straightlie distaces betwee odes. Thus, the cost fuctio to be miimized is the total amout of "flow" i the etwork, i.e., the product of message legth i bits, multiplied by the distace through which the message must travel o a direct path to its destiatio, summed over all messages. I this view a update message is assumed to be set separately to each file locatio, rather tha relayed. Assumptios (1)-(4) igore the etwork commuicatio liks as cofigured by Kleirock. Ordiarily these would determie the cost matrix. Substitutig "distace" for "cost," as doe here, might be expediet i a prelimiary aalysis to allocate resources prior to a topological desig of the etwork. We employ the ARPA data primarily because it tests our model o a problem ivolvig may odes. I additio, we should like to ecourage speculatio o the cocept of multiple copies of a resource (data or programs), ad whether such a facility is attractive i future ARPA-like etworks. Figure 5 summarizes the file allocatio experimets coducted usig this data, i which the proportio of update traffic to query traffic was varied from 100 percet to 10 percet i steps. As expected, oly sigle ode allocatios are idicated whe the two types of traffic are equal. As a smaller volume of update traffic is geerated, multiple ode solutios begi to appear. The copies of the file are widely distributed geographically, ad the sigle ode solutio is cetrally located. However, such features result from the particular traffic distributio assumed, ad are ot geeralizatios valid for every case. I t is iterestig to ote the large umber of local optima geerated (see Figure 5). Because certai ARPA sites, e.g., MIT, BBN, Harvard, ad Licol Laboratories, are very ear oe aother, if oe occurs i a local optimum oe of its eighbors may be substituted for it to produce a secod cofiguratio that From the collectio of the Computer History Museum (www.computerhistory.org)

Allocatio of Copies of a File i a Iformatio Network 623 COST PER MEGABYTE SHIPPED QUERY FILE NODES TRAFFIC 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 U 1 8 0 75 75 75 75 350 450 490 640 900 1050 2080 2700 2700 2700 2670 2610 2610 2610 S 2 21 75 0 5 5 5 300 400 480 660 920 1060 2110 2720 2720 2720 2700 2640 2640 2640 E 3 7 75 5 0 5 5 300 400 480 660 920 1060 2110 2720 2720 2720 2700 2640 2640 2640 R 4 6 75 5 5 0 5 300 400 480 660 920 1060 2110 2720 2720 2720 2700 2640 2640 2640 S 5 10 75 5 5 5 0 300 400 480 660 920 1060 2110 2720 2720 2720 2700 2640 2640 2640 6 4 350 300 300 300 300 0 160 280 500 710 850 1940 2570 2570 2570 2520 2450 2450 2450 7 3 450 400 400 400 400 160 0 190 430 610 730 1850 247 247 247 240 233' 233 233 8 10 490 480 480 480 480 280 190 0 240 430 570 1670 2300 2300 2300 2240 2170 2170 2170 9 9 640 660 660 660 660 500 430 240 0 280 430 1460 2090 2090 2090 2050 1980 1980 1980 10 50 900 920 920 920 920 710 610 430 280 0 160 1250 1860 1860 1860 1800 1730 1730 1730 11 3 1050 1060 1060 1060 1060 850 730 570 430 160 o 1160 1760 1760 1760 1680 1610 1610 1610 12 26 2080 2110 2110 2110 2110 1940 1850 1670 1460 1250 1160 0 610 620 620 640 600 600 600 13 20 2700 2720 2720 2720 2720 2570 2470 2300 2090 1860 1760 610 0 40 40 290 340 340 340 14 21 2700 2720 2720 2720 2720 2570 2470 2300 2090 1860 1760 620 40 0 5 250 320 320 320 15 3 2700 2720 2720 2720 2720 2570 2470 2300 2090 1860 1760 620 40 5 0 250 320 320 320 16 5 2670 2700 2700 2700 2700 2520 2400 2240 2050 1800 1680 640 290 250 250 0 80 80 80 17 4 2610 2640 2640 2640 2640 2450 2330 2170 1980 1730 1610 600 340 320 320 80 0 10 10 18 8 2610 2640 2640 2640 2640 2450 2330 2170 1980 1730 1610 600 340 320 320 80 10 0 5 19 7 2610 2640 2640 2640 2640 2450 2330 2170 1980 1730 1610 600 340 320 320 80 10 5 0 Figure 4-The ARPA example. The locatios of the odes (from Kleirock 7 ) are: (1) Haover, N.H., (2)-(5) Bosto Area, (6) Murray Hill, N.J., (7) Washigto, D.C., (8) Pittsburgh, Pa., (9) A Arbor, Michiga, (10) Urbaa, Illiois, (11) St. Louis, Mo., (12) Salt Lake City, Utah, (13)-(15) Sa Fracisco Bay Area, (16) Sata Barbara, Califoria, (17)-(19) Los Ageles Area is also locally optimum. Most of the local optima are due to this pheomeom, suggestig that greater computatioal efficiecy would have resulted from lumpig eighborig sites ito a sigle etwork ode for a iitial applicatio of the algorithm, followed by a stage of fier calculatios. Such artifices are hardly ecessary i problems of this size, however. The sequece of six rus depicted i Figure 5 was carried out o a IBM 360/91 i less tha te secods, icludig Fortra compilatio. EXTENSIONS Path tracig o a related problem Oe feature of the path-tracig techique is its versatility. It ca be applied to problems that do ot Update/Query Optimal Allocatio Percet 10 2,10,14 20 9,14 30 10,14 40 10,12 100 10 Cost #Local Miima 117,544 140 188, 738 88 242, 546 88 291, 754 77 427,460 19 Figure 5-Results for the ARPA example fit the liear programmig framework. As oe example cosider the followig modificatio of the file allocatio model. The cost fuctio as defied i the text assumes that the expese of trasmittig a update message to all copies of the file is the sum of the costs of commuicatig with each copy separately, as if a distict message had to be set from the up dater to every file ode. Actually, the preferred mode of operatio would be to relay the message through the etwork i the most ecoomical maer. The resultig cost is less tha that icurred by sedig duplicate messages. It is the cost associated with the most ecoomical subtree of the etwork that cotais the origiatig ode ad the file odes (see Figure 6). This ew cost fuctio defied whe update costs are calculated i this way is oliear i a very complex way. Yet the path-tracig algorithm ca still be applied to the allocatio problem. We o loger have the assurace provided by Theorem 2, which guaratees that the method will geerate the optimal allocatio. However, the cost fuctio behaves i much the same maer as before. It still cosists of two parts: a update (ad fixed cost) term that icreases as additioal file copies are allocated, ad a query term that decreases with additioal copies. The algorithm may be heuristic i the ew framework, but with oly slight modificatio it ca be programmed to determie at least allocatios of the type we have called "local optima," of which the true optimum is oe. From the collectio of the Computer History Museum (www.computerhistory.org)

624 Sprig Joit Computer Coferece, 1972 Give Network: ~ file odes, Q ode origiatig a update. Liear cost of update as per cei) = 2 + 3 + 8 + (8 + 5) + 7 = 33. Figure 6. Most ecoomical subtree. Cost of relayig update = 23. Figure 6-Most ecoomical subtree for relayig a update message Heuristic modificatio of the path-tracig routie If the optimal solutio to the allocatio problem cosists of may etwork odes the the geeral rule of followig every mootoically decreasig path i the cost graph may be computatioally too expesive. For example, if the optimum cotais 40 odes the at least 2 40 == 10 12 values of the cost fuctio must be evaluated. I order to decrease the amout of computatio, it is ecessary to sacrifice optimality for speed. We suggest several methods by which this may be doe i a reasoable maer. Assume that the cost graph is beig searched levelby-level (i.e., i parallel). At each level, the vertices which are lower i cost tha their atecedets are recorded. We call these vertices "admissible" ad deote them by the set Ak at the kth level. We wish to carry oly a limited umber of the members of Ak ito the (k+1)th stage. Oe plausible selectio rule is to keep the vertices havig the lowest cost. Alteratively, we may reaso that if two assigmets are similar (i.e., differ i oly a few file odes) the probably their uio will also lie o a mootoic cost path, ad it would' be redudat to trace paths through both. I 4 this view, we wat to select the most "dissimilar" vertices from A k Several techiques are available to do this selectio. Oe method, previously used i patter recogitio, is the followig. Order the members of Ak arbitrarily, specify a threshold T (a parameter), ad examie the members of Ak i order. Deote by Ak' the subset beig formed. The first member of Ak' is the first member of A k, say al. The ext member of Ak' (deoted a2) is the first member of Ak which differs from al by at least T uits. Next, test for a aa which differs from both al ad a2 by at least T ad so o. If Ak' has too may members, or too few, the routie may be repeated with T larger, or smaller, respectively. The set Ak' is used to geerate A k+ l, which is i tur thied dow to A k+l '. A third method has actually bee programmed ad compared with the geeral search techique. The procedure is to do complete path-tracig (applyig Theorem 2) up to a level of the cost graph at which set Ak begis to get too large. From this level o, oly the most "promisig" paths are followed. For example, as programmed the cost graph is completely evaluated through the secod level (2-file odes per vertex). From each admissible secod level vertex oly a sigle path is followed; amely, that path which gives miimal cost at each succeedig level. Sice this program oly has oe-step look ahead, the optimum may be missed. I sample rus o the ARPA problem, the optimum was always foud (for each value of the update/query parameter), although local optima were sometimes overlooked. This heuristic program was also tested o the ARP A cofiguratio usig the revised cost fuctio described above, i.e., a fuctio which assumes that updates are relayed betwee file otes i the most ecoomical maer. Use of a simplified program is called for i this case; the heuristic program required 12 miutes to repeat the five rus of Figure 5, which were performed i uder te secods usig the liear cost relatioship. Figure 7 shows the allocatios produced by this experimet. These assigmets may ot be global optima. They do satisfy our defiitio of a local optimum. Note that, as expected, for the same traffic more copies of the Update/Query Percet 10 20 30 40 100 Optimal Allocatio (Nodes) 2, 6, 8, 9, 10, 12, 13, 14, 16, 18 2, 8, 10, 12, 14 2, 8, 10, 12, 14 8, 10, 12 10 Cost 38, 285 105, 800 171, 225 227,340 427,460 Figure 7-Allocatio results usig the oliear cost model CARP A etwork) From the collectio of the Computer History Museum (www.computerhistory.org)

Allocatio of Copies of a File i a Iformatio Network 625 file are assiged if updates are relayed. I additio, total cost is reduced markedly uder coditios of low update traffic, idicatig the importace of icludig the more complex cost relatios i the model. CONCLUSION The aalytical properties of a liear-cost model of a iformatio etwork have bee ivestigated. The proportios of update traffic to query traffic geerated by the users of a give file i the etwork were show to determie a upper boud o the umber of copies of the file preset i the least-cost etwork. I additio, a basic property of the cost fuctio was demostrated ad show to justify a path-tracig procedure for determiig the optimal assigmet of copies of the file to odes of the etwork. The model, while simple, expresses relevat features of the tradeoff betwee the costs of queryig ad updatig i the etwork. Presumably, the geeral properties derived apply at least approximately whe more complex models are cosidered. ACKNOWLEDGMENT Dr. M. E. Seko first poited out to the author the eed for a allocatio model which would distiguish betwee the query ad update activity i a etwork. The work reported here owes additioal debts to the ecouragemet ad commetary of Drs. C. P. Wag, H. Lig ad V. Lum. REFERENCES 1 M A EFROYMSON T L RAY A brach-boud algorithm for plat locatio Operatios Research Vol 14 No 3 pp 361-368 May-Jue 1966 2 E FELDMAN et al Warehouse locatio uder cotiuous ecoomies of scale Maagemet Sciece Vol 12 No 9 pp 670-684 May 1966 3 K SPIELBERG Algorithm for the simple plat-locatio problem with some side coditios Operatios Research Vol 17 1969 pp 85-111 4 W D FRAZER A approximate algorithm for plat locatio uder piecewise liear cocave costs IBM Research Report RC 1875 IBM Watso Research Ceter Yorktow Heights N Y July 25 1967 5WWCHU Optimal file allocatio i a multiple computer system IEEE Tras o Computers Vol C-18 No 10 October 1969 6 V K M WHITNEY A study of optimal file assigmet ad commuicatio etwork cofiguratio i remote-access computer message processig ad commuicatio systems SEL Techical Report No 48 Systems Egrg La Dept of Elect Egrg Uiversity of Michiga September 1970 (PhD Dissertatio) 7 L KLEINROCK Models for computer etworks Proc Iteratioal Coferece o Comm Boulder Colorado pp 2.9-2.16 Jue 1969 From the collectio of the Computer History Museum (www.computerhistory.org)

From the collectio of the Computer History Museum (www.computerhistory.org)