INFT 803. Fall Semester, Ph.D. Candidate: Rosana Holliday

Size: px

Start display at page:

Download "INFT 803. Fall Semester, Ph.D. Candidate: Rosana Holliday"

Alaina Ross
5 years ago
Views:

1 INFT 803 Fall Semester, 1999 Ph.D. Candidate: Rosana Holliday

2 Papers Addressed Removal Policies in Network Caches for World-Wide Web Documents Performance of Web Proxy Caching in Heterogeneous Bandwidth Environments Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

3 Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

4 Abstract Sharing of caches among Web: Reduces Web traffic Alleviate network bottleneck Not widely used due to the existing protocol overhead Paper proposes a new protocol called Summery Cache

5 Paper Approach Each proxy keeps a summary of the URL s of cached documents of each participating proxy Checks these summaries for potential hits before sending any queries

6 Why Low Overhead? Summaries update only periodically Summaries representations are economical (8 bits/entry) Trace driven simulation & a prototype implementation used Compared to ICP (Internet Cache Protocol), Summary cache reduces the # of inter-cache messages by a factor of 25 to 60

7 Why Low Overhead? - Continuation Reduces the bandwidth consumption by over 50% Eliminates 30 to 95% of CPU overhead Maintains almost the same hit ratio as ICP

8 Introduction Cache sharing first proposed by the Harvest project (which designed the Internet Cache Protocol - ICP). It supports discovery of documents from neighboring caches.

9 Introduction - Continuation ICP produces overhead Communication and the processing overhead increases quadraticaly (multicast queries generated)

10 Other Proposed Alternative Cache array routing protocol (partitions URL space among proxies)

11 Cons - Not appropriate for WAN cache sharing

12 Paper Addresses Scalable protocols for WW Web cache sharing

13 Paper Methodology Quantify ICP overhead by running set of proxy benchmarks > 4 proxies, ICP increases inter-proxy traffic by a factor of 70 Increases # of network packets received by each by over 13% Increases CPU overhead by over 15% Increases average user latency by 11%

14 Paper Methodology - Continuation Summary Cache Proposed Each proxy keeps a compact summary of the cache directory of every other proxy If a miss occurs - proxy probes all the summaries and sends requests to only promising results

15 Paper Methodology - Continuation Key questions Examined Frequency of summary updates Representation of summary Reducing memory requests Store each summary as a BLOMM filter

16 Partial Results Design the summary cache enhanced ICP protocol Implement a prototype within the SQUID proxy

17 Partial Conclusion Results show that the summary cache enhanced ICP protocol can scale to a large # of proxies Increase Web cache sharing Reduce Web traffic Implementation made publicly available

18 Future Cache Digest (a variant approach is under beta the National Cache Hierarchy

19 Traces and Simulations Five sets of traces used Digital Equipment Corporation Web Proxy server (DEC) University of Beckley Dial - IP Service (UCB) Uses in the CS Department, University of Pisa, Italy GET requests by parent proxies in Australia One-day log of major parent proxies (bo, pb, sd and uc)

20 Traces and Simulations - Continuation National Web Cache hierarchy (National Lab of Applied Network Research)

21 Simulation Approach DEC, UCB & U PISA partitioned in groups # of groups in DEC, UCB & U PISA traces to 16, 8 & 8 respectively Questnet traces contain HTTP GET requests from 12 child proxies in the regional network NLANR traces contain actual HTTP requests count to the major proxies

22 Simulation Approach - Continuation Assumptions Cache size is 10% of the infinite size Documents larger that 250 KB are not cached No expiring documents based on age or time to live used Traces come with last modified time or BR used (if it changes, counted as a miss)

23 Benefits of Cache Sharing Reduces traffic to the Internet ICP style simple cache sharing suffices & more tightly coordinated schemes (global replacements)

24 Overhead of ICP ICP is non-scalable It relies on query messages to find remote cache hits - cascades when a miss happens

25 Benchmarks Wisconsin proxy benchmark used Collection of client processes that issue requests following patters observed in real traces

26 Hardware Connectivity Characteristics 10 SUN SPARC - 20 workstations 100 MB/s Ethernet 4 workstations act as proxy systems running a SQUID MB cache space allocated Four workstations run 120 client processes (30 processes on each workstations)

27 Hardware Connectivity Characteristics - Continuation Two workstations act as servers, each with 15servers listing on different ports Each process waits for 1 second before sending the reply to simulate network latency

28 Experiments Two different cache hit rate ratios (25% and 45%) 200 requests in each experiment, total of 24K requests

29 Configurations Compared No ICP (no proxy collaborations) ICP proxies collaborate

30 Measurements Hit ratio in the caches Average latency seeing by clients/user System CPU times consumed,ed by the SQUID proxy network traffic

31 Partial Conclusion Clear benefits of caching shared but the overhead of TCP is high Summary cache protocol proposed to address this problem

32 Summary Cache Approach Each proxy stores a summary of URL s of documents every other proxy If a miss happens, the proxy checks stored summaries and: Finds it and fetches o Directs requests directly to the Web server

33 Summary Cache Approach - Continuation Summaries do not have to be up to date or accurate Can be regular intervals

34 Tolerated Errors False misses - Requested document not included in the summary False hits - Requested document not cached but listed in the summary (wasted query)

35 Other Errors Remote stale hits Summary cache & ICP when a document is another proxy, but the cached copy is stale

36 Scalability - Limitation Factors Network Overhead (inter-proxy traffic) Memory required to store the summaries (stored in DRAM for performance reasons)

37 Network Overhead Frequency of summaries updates # of false hits and remote hits

38 Memory Requirements Size of individual summaries # of cooperating proxies Important to keep Important to keep individual summaries small

39 Impact of Update Delays % cached documents that are new reaches a threshold Alternative would be updating summaries on regular intervals

40 Results Except for the NL, ANR trace data, the degradation in total cache hit ratio increases almost linearly with the update threshold Delays threshold of 1% to 10 % for updating local summary results in a tolerable degradation

41 Results - Continuation For 5 traces Threshold values translate into roughly 300 to 3000 user requests between updates Roughly every 5 minutes to an hour update frequency

42 Summary Representation Size of summary (scalability issue) DRAM storage (reference) Exact directory List of URL of cached documents Collection of Web Server names in the URL s of cached documents Ratio of different URL s to different Web server names is 10 to 1 (Server name approach can cut down memory requirement by a factor of 10)

43 Findings Neither approach is satisfactory Exact directory approach consumes too much memory Server name approach generates to many false hits, increasing the network traffic

44 Proposed Solution BLOMM Filters Method for representing a set of n elements (keys) to support membership queries Provides a straight forward mechanism to build a summary A proxy builds a BLOOM filter from the list and sends list array to other proxy

45 Advantages Provides a trade off between memory requirement and false positive ratio Proxies want to devote less memory to the summary. They can do a slight increase of inter proxy traffic

46 Conclusions BLOMM filter summaries provide the best performance in terms of low network overhead & low memory requirement

47 Recommended Configuration Update Threshold between 1% and 10% to avoid significant reduction of total cache hit ratio If time based interval should be chosen such as the percentage of new documents is between 1% and 10%

48 Scalability Extrapolate results (4 to 16 proxies considered in the study)

49 Implementation and Experiments Summary cache enhanced ICP Assumptions Small delay thresholds Updates summaries via sending the difference Implementation leverages SUQUIDS built in support to detect failure & recovery of neighbor proxies re-initializes a failed neighbor s bit away when it recovers

50 Performance Experiment Results Enhanced ICP protocol reduces the network traffic $ CPU overhead Slightly decreases the total hit ratio Enhanced ICP lowers client latency Increases CPU time by 12%

51 Conclusions and Future Work Summary cache enhanced ICP scalable Web cache sharing protocol Summary cache reduces traffic overhead

52 Effects of Delayed Updates BLOMM filters based summaries with update delay thresholds has low demand on memory and bandwidth Achieves hit ratio similar to overall ICP protocol

53 New Protocol Advantages Reduces # of inter-proxy protocol messages by a factor of 25 to 60 Reduces bandwidth consumption by over 50% No degradation in the cache hit rates It is scalable

54 Future Impact of the protocol on parent-child proxy cooperation Optimal hierarchy configuration for a given workload Application of summary cache to various Web cache consistency protocols

55 Future - Continuation Summary cache can be used in individual proxy implementation to speed up cache lookup Modification of proxy implementation

56 Removal Policies in Network Caches for World-Wide Web Documents

57 Abstract Caching documents can reduce # of requests that reach popular servers (*) Volume of network traffic resulting from document requests (*) End-user latency retrieving documents (*) Focus of this paper

58 Important Definitions Cache Hit Rate (HR) - fraction of client requested URL s returned by the proxy Weighted Cache Hit Rate (WHR) - fraction of client requested bytes returned by the proxy

59 Removal Concept Client request for uncashed documents may cause the removal of one or more cached documents Document sizes and types define distinct policies to select a document for removal

60 Simulation Approach Trace driven Determines Maximum possible HR and WHR achieved by a cache Removal policy that maximizes HR and WHR Five traces used 30 to 190 days of client URL requests

61 Introduction WWW id inherently unscalable Identical copies of many documents pass through the same network links (high volume of access to Web pages)

62 Costs Requires bandwidth upgrades (network administrators nightmare) High server utilization - upgrades and/or replacement of servers (Web site admin nightmare) Greater latency for document requests for end users (end user nightmare)

63 Potential Solution? Migration of copies of popular documents from server to points closer to users

64 Migration Models Distribution - Servers control where the document copies are stored (commercial users - copyrighted documents) Cache - copies automatically migrate in response to user requests (most popular) At Server At a client - built into the Web browser In the Network itself (proxy servers) (*) (*) Focus of this paper

65 Document Any item retrieved by a URL (dynamic pages, audio files, etc.) On a Web server = original Cashed = copy

66 Scenario Client requests document D (client configured to use proxy P) Client requests document (GET) From the proxy server P Proxy Server cached document from Web server S

67 Three Possible Scenarios P has a copy of D (consistent) - P serving the local copy = HIT (*) P has a copy of D (inconsistent) - P sends and HTTP GET to S with the last-modified time of it s copy (*) P does not have a copy - GET is forwarded to another proxy server or to S (MISS) (*) This paper does not cover methods for measuring consistence

68 Potential Advantages Proxy caches can dramatically reduce network load!

69 Potential Disadvantages Only works with statically or infrequently changing documents HTTP 1.0 is not reliable identifying whether a Web document is cacheable or not There are no standards in the Web for keeping cached copies consistent Copyrighted laws can rule out caching!

70 Criteria Reduced by Caching Proxies # of requests that reach servers (measure HR) (*) Volume of network traffic (document requests) (*) Latency experienced by the end user when retrieving a document (measure=transfer time) (*) Cases considered in this paper

71 Measures # of bytes not sent to proxy server (*) # of packets not sent by server Reduction in the distance that the packets traveled (hop count) (*) Considered in the paper complemented by the fraction of bytes returned by the proxy (WHR)

72 First Experiment Max values of HR and WHR Concentration - Many clients request the same URL Temporal locality - single client requesting same URL many times

73 Second Experiment Effect of removal policies on HR, WHR and cache size Removal policy controlled by the free space threshold Document type - audio or video images

74 Comparing Removal Policies FIFO - First in, fist out Least recently used (LRU) LRU-Min - considers document size LFU - Least frequently used Hyper-G - Servers policy Pitkow/Recker

75 Methodology Develop a taxonomy of removal policies for each case presented in the previous slide Study yields policies that no one has yet proposed Effectiveness of second level caches

76 Third Experiment Finds the theoretical maximum HR and WHR of a second level cache

77 Fourth Experiment Should a cache be partitioned by media type? High volume of audio bytes transferred used

78 Definitions Cache hit - Match in both URL and size Two workloads used - BR and BL (to avoid mis-measurements by documents changed but not detected)

79 Conditions Server return code = 200 Accept Log records + 0 for requested URL never used before (discard) These conditions keeps HR and WHR consistent

80 Removal Algorithms Two phases Sorts documents in cache based on keys Amount of the free cache space >= incoming document size Removes zero or more documents from the head of the sorted list until the criteria is satisfied

81 Table 2 Top - Sample trace Middle - Key time 15 + Bottom - Resultant sorted lists (*) Selected for removal

82 Removal Policies from Literature FIFO - Sorts documents by increasing values of the time each document entered the cache (ETIME) Removes document with the smallest cache entry time (first to enter the cache)

83 Removal Policies from Literature - Continuation LRU - Sorts by # of references (NREF) Hyper-G - Starts by using LFU (NREF primary key) then uses LRU (last access time is the secondary key) and size is the tertiary key Study explores both primary keys used by the Pitkow/Recker policy

84 Removal Policies from Literature - Continuation Large files tend to be removed from the cache first Followed by the LRU to select among similar sized files

85 Removal Policies from Literature - Continuation Table 1-6 sorting keys

86 Factors Used in the Experiment Primary Key Secondary Key Workload

87 Response Variables Used in the Experiment HR WHR

88 Other Considerations Used in the Experiment Always use random replacement as the secondary key This gives 36 combinations of primary and secondary keys (36 policies)

89 Experiment Goal Identify the most effective key combination for the workload

90 Unexplored Removal Policies When to replace? How many documents to remove?

91 When to Run the Removal Policy? On-demand - When the size of the requested document exceeds the free room in a cache Periodically - Run policy every T time units Both - Demand and the end of each day and on demand (Pitkow/Recker)

92 How Many Documents to Remove? On-demand - Stops when the free cache area >= the requested document size Periodic - Replaces documents until a certain threshold is reached (confort level) Experiment does not study these choices - simulation computes HR not timings of access and removal

93 Periodical Removal Argument If the cache is nearly 100% full, running the removal policy only on demand will invoke the removal policy on nearly all document requests - OVERHEAD!

94 On-Demand Removal Argument Sorted list kept consistently removing the headers of the list Proxy keeps read-only documents No overhead writing-back a document

95 Workloads Used in The Study Server-based trace - Logs from individual Web server Client-based logs collected by instrumenting a set of Web Clients Proxy based trace - URL requests that reach a point in a network (proxy log or TCP dump log) (*) (*) Used in the experiment

96 Proxy Based Traffic Similar to server based when it monitors traffic from client spreaded on the Internet to a few servers (BR workload) Similar to client based if it monitors traffic from clients within a network to servers anywhere in the Internet(U, C, G and BL workloads)

97 Workloads Used Five workloads Virginia Tech All (except BR) represents: 33 faculty staff members 369 undergrads 90 grad students

98 Connectivity Type/Equipment Used 185 Ethernet computers and X-terminals Home/dormitory computers (SLIP over modems or Ethernet) 12 HTTP daemons typically running within the CS Department Web used to make syllabi course and notes assignments to 20 undergrad and 10 grad courses

99 Under grad (U) 30 workstations; 190 days 173 valid accesses requiring transmission of 2.19 GB of static Web documents Group working in close confines

100 Classroom (C) 26 workstations Each student runs a Web browser during 4 class sessions 30 valid accesses requiring transmission of MB of static Web documents Clients in an instructional setting - Requests when asked to do so

101 Graduate (G) Time shared client 25 users containing 46 valid accesses requiring transmission of 610 MB of static Web pages Clients in one Department dispersed throughout a building (separate or common work areas)

102 Remote Client Backbone Accesses (BR) URL request traced on the Ethernet backbone of.cs.vt.edu domain Client outside domain naming a Web server inside that domain 38 day period, representing 180 requests requiring transmission of 9.61 GB of static Web pages Few servers on the large Department LAN serving documents to WW clients

103 Local Client Backbone Accesses (BL) URL requests traced on the C.S. Department backbone Client from in the Department naming any server in the world 37 day period, representing 53 accesses requiring transmission of 644 MB of static Web pages Requests within and outside.cs.vt.edu domain

104 Workloads Characteristics U represents requests that a single proxy would cache Trace from an operational caching proxy acting as a firewall No Web servers on the network to which the clients are connected

105 Workloads Characteristics - Continuation C represents a proxy that is positioned within a classroom to serve student machines in the classroom Representative of a real proxy performance

106 Workload Collection Procedure U, C & G collected using a CERN proxy server Mosaic clients in workload G Netscape client in workload C Both configured to point to a proxy server running within.cs.ut.edu domain) Workload U clients running on Unix workstations within a lab pointing to a CERN proxy server running on a firewall

107 Workload Collection Procedure - Continuation BR & BL the same time TCP dump on Department s backbone filtered to identify NCSA & CERB log formats Filter

108 Workload Definitions U, G & C Average request rates under 2000 per day (Spring & Fall) U soared 5000 request rates per day in the beginning of Fall Combined BL & BR workloads represents 3000to 6000 requests per day

109 Distribution of File Types Table 4

110 Distribution of File Types - Continuation Most frequent requests document type was graphics, followed by text (HTML) U & BL - Graphics and text (70%) C - Graphics and video (74%) G - Graphics, text and Video (88%, evenly distributed) BR - Audio & graphics (96%)

111 Phenomena Media type may account for a small fraction of references, but a large fraction of bytes transferred

112 Distribution of Requests for a Particular Server for Workload BL # of servers $ unique URL requested in the backbone local client traffic (BL) ZIPPF Distribution 10 <= requests went to 1666 servers requests went to 84 servers (13 outside the domain - caching would help) # of requests inversely proportional to # of servers

113 Distribution of # of Bytes Transferred =~ 290 URL s of =~ 36K unique URL s referenced returned 50% of the total requested bytes

114 Partial Conclusion Widespread proxy caching networks with this workload could dramatically reduce the load on popular Web servers

115 Experiment Objectives What is the maximum possible WHR & HR that a cache can achieve? How large must a cache be for each workload so that no document is ever removed? What removal policy maximizes WHR for each workload over the collection period?

116 Experiment Objectives - Continuation How effective is the second level cache? Given a fixed amount of disk space for a cache should all documents types be cached together, or should the cache be partitioned by document type?

117 Experiment Design Four Experiments were conducted: 1 - Addresses objectives 1 & 2 Max WHR - Simulates each workload with an infinite cache size 2 - Addresses objective 3 Simulation of each workload with all combinations of keys listed in Table 1

118 Experiment Design - Continuation 3 - Addresses objective 4 Simulates a two level cache 4 - Addresses objective 5 Cache size=10% of MaxNeeded, using one workload (BR) - 88% bytes transferred are audio files

119 Response Variables Measured HR WHR Maximum cache size

120 Methodology All experiments start with an empty cache All experiments run from the full duration of the workload WHR & HR reported for each day separately 7 days moving average to daily hit rates plotted 1 week is long enough to cover the typical cycle

121 Experiment Results Figures 3-7 Max possible HR HR ranges from 20% to 98% WHR ranges from 12% to 98% HR>=WHR for U, G & C HR = WHR for BR & HR

122 Seasonal Variables Fig 3 WHR Drop - Day 65 Break between Spring & Summer Semesters HR Declines - Start of Fall Semester New users & increase in rate of access courses declines

123 Seasonal Variables - Continuation Fig 4 G & C Contained in the Spring Semester HR for G remains steady then jumps to a MAX of 90%

124 Seasonal Variables - Continuation Fig 5 C Same highest rates (students follow instructions) HR starts high, drops and stabilizes then rises again near the end of the semester (students reviewing material in preparation for the final exam)

125 Seasonal Variables - Continuation BR achieves highest HR by far (>+98%) All URL s named one of a small set of Web servers in the domain Traffic to popular Web pages with audio files on a single server dominate the traffic

126 Experiment 1 - Cache Size Sufficient size to never replace a document 221 MB for C 413 MB for G 408 MB for BL 198 MB for BR 1400 MB for U

127 Experiment 2 - Removal Policy Comparation Primary key performance HR Simulates a finite cache size to investigate what removal policy performs best

128 Experiment 2 - Removal Policy Comparation - Continuation Ratio of HR plotted Better interpretation by comparing the corresponding experiment 2 and experiment 1 graphics

129 Experiment 2 - Sorting Keys Plotted SIZE WHR.=90% of optimal most of the time Replacement based on SIZE or (log2(size)) always out performs any other replacement criteria

130 Why Does Cache SIZE Work Well? Fig 13 Most requests are for small documents SIZE keeps small files in the cache and makes HR high for two reasons: Most references goes to small files Removal of a large file makes room for may small files Professional Web pages are small in size Users avoid using large documents

131 Why Does Cache SIZE Work Well? Continuation Fig 14 Large # of references to files in 1-2 MB range Removal of 2MB files makes room for KB files (referenced 10 times each)

132 Why Does Cache SIZE Work Well? Continuation ATIME Center mass lies on small size area (over 1KB) Large Interference time (15K sec or 4.1 hours) Implies ATIME (LRU) discards many files that will be reference in the future)

133 Why Does Cache SIZE Work Well? Continuation NRFS & ETIME ETIME performs a few points worse than ATIME for all loads U - NREFS performs the same as ATIME and ETIME BR - NREFS better than ETIME and ATIME C, G and BL - Mixed results

134 Primary Key Performance on WHR WHR - SIZE worst performer (unlike HR) No conclusions about which policy maximizes WHR

135 Secondary Key Performance SIZE outperforms all other keys Fig 15

136 Secondary Key Performance - Continuation How much better (or worst) does each possible secondary key do compare to a random secondary key? Fig 15

137 Secondary Key Performance - Continuation Ratio of WHR for each secondary key All secondary keys played a very small role in caching WHR DAY (ATIME), SIZE and ETIME fall bellow some point

138 Secondary Key Performance - Continuation NREF performed best (105% peak performance)

139 Experiment 3 - Effectiveness of Two Level Caching Uses the HR best policy from experiment 2 (SIZE) for primary key and random as secondary key Primary cache set to 10% of MaxNeeded Secondary cache has infinite size

140 Experiment 3 - Effectiveness of Two Level Caching - Continuation Miss in primary cache Request sent to the second level cache which: Returns a copy of the document to primary cache Misses it - Document is placed in both primary and secondary cache

141 Experiment 3 - Effectiveness of Two Level Caching - Continuation Fig Three trends for five workloads: Memory - Starved primary cache (10% of Max needed). Second level caches reaches a maximum (1.2-8% HR) & 15-70% WHR Secondary cache plays a major role SIZE is primary key of primary cache - This makes the secondary cache very important!

142 Experiment 3 - Effectiveness of Two Level Caching - Continuation WHR larger than HR Differences: WHR curve level out To what extent the second level cache is utilized?

143 Experiment 3 - Effectiveness of Two Level Caching - Continuation Fig 17 Workload working set fits 10% of MaxNeeded (for two months) After this, secondary cache experiences growth in the HR & WHR

144 Experiment 3 - Effectiveness of Two Level Caching - Continuation Fig 18 WHR fluctuates

145 Experiment 4 - Effectiveness of Partitioned Caches BR Workloads - Audio files Does this population degrades the performance of clients using text & graphics? Could partitioned cache (dedicated audio cache) increase the WHR for an audio audience?

146 Experiment 4 - Effectiveness of Partitioned Caches - Continuation SIZE is the primary key Random secondary key Dedicated 1/4, 1/2 or 3/4 of cache dedicated to audio

147 Experiment 4 - Effectiveness of Partitioned Caches - Continuation Fig 19 Fig 20 Comparing Fig 19 and Fig 20: Heavy audio use overwhelms even a 3/4 audio portion to a 1/2 audio portion Splitting the cache into two partitions of equal size would maximize the overall WHR

148 Conclusions and Future Work SIZE based removal policies in network caches outperform any other removal criteria For all 5 workloads, primary key SIZE achieved higher HR LRU-Min policy (uses primary key like Log2(SIZE)) is one of the best policies NREF primary key (LFU) ranks as second best in HR

149 Conclusions and Future Work - Continuation Suggested ranking SIZE first NREF ATIME

150 Conclusions and Future Work - Continuation Pitkow/Recker policy would work better if SIZE alone is used as a key DAY(ATIME) was one of the worst performers as a primary key SIZE as primary key blends well with a two level cache hierarchy

151 Conclusions and Future Work - Continuation Open problems: Some sorting keys for removal algorithms have never been explored in caching proxy implementation Document type Refetch latency (geographic criteria for instance) How caching can help dynamic documents? How effective are second-level proxy caches? Interaction between removal algorithms with inconsistent cached copies algorithms

152

153 Performance of Web Proxy Caching in Heterogeneous Bandwidth Environments

154 Abstract All metrics prior to this paper focuses on high level parameters (HR, for instance) Low-level parameters will be taken into consideration: Cookies Aborted connections Persistent connections Between client/proxy Between proxy/server

155 Implications of Low level Details Performance Particularly in heterogeneous bandwidth environments Network speeds between clients & proxies are different than speeds between proxies & servers

156 Experiment Overview Evaluation of latency & bandwidth effects of Web proxy caching Simulation driven by packets traces from: Clients connected via slow dial up modems (ISP) Clients on fast LAN

157 Results Presented Caching persistent the proxy can improve latency more than simply caching Web data Aborted connections can waste more bandwidth than that served by caching data Cookies can dramatically reduce HR by making many documents effectively uncacheble

158 Introduction Clients configured to request documents from a proxy server Proxy keeps a copy from the content provider

159 Introduction - Continuation Proxy improves performance by: Reducing user-perceived latency Lower the network traffic from Web servers Reduce service demands on content providers (lower transit costs)

160 Paper Approach Performance implication factors that can t be a high level Aborted transfers, persistent TCP connections and HTTP headers that may affect cacheability of resources

161 Paper Approach - Continuation Workload used by the Web Proxy simulator Trace collected in a heterogeneous bandwidth production environment (AT&T WorldNet) AT&T Labs Research

162 Paper Approach - Continuation Simulation includes data in transit Simulation includes all HTTP headers (to compute cookies) Simulation measures time - slow modem versus faster LAN

163 Findings Study shows that bandwidth consumption may increase due to aborted requests Benefits of persistent connections (client/proxy) can overweigh the benefits of caching documents High fraction of responses with cookie header

164 Cookies? Customizing resources on a per-user- basis Inappropriate to cache HTTP 1.01 resources that have cookies Use of cookies is increasing

165 Tracing Environment Modem trace (ISP subscribers) Research trace (high speed link) Fig 1

166 Modem Trace Characteristics Collected from FDDI ring Proxy caches reside in the ISP Fig 1

167 Modem/Connectivity Configuration 450 modems two terminal servers Shared by `= 18k dial up users PPP protocol connections IP routable protocol used

168 Modem Data Collection Raw packet traces (dedicated 500 MHz Alpha Workstations attached to the FDDI ring) Receiving packets only 12 day period (Mid-August, 1997)

169 Data Characteristics ~= 18K users ~= 154K dial up sessions Max of 421 concurrent sessions Second trace obtained for 6 days (Mid July, 1998)

170 Data Characteristics - Continuation HTTP port 80 TCP events: time stamps, sequence # and packets acknowledgements HTTP events: time stamps for HTTP requests from clients & HTTP responses from servers; time stamps for packets containing first & last portion of the data in each direction HTTP headers: Complete HTTP headers for both requests and responses

171 Data Characteristics - Continuation Byte counts: count of bytes sent in either direction for HTTP header and body if present

172 Research Trace Characteristics Recorded Accesses from the AT&T Labs Research Community

173 Research Tracing Environment Fig 2

174 Research Tracing Environment - Continuation Ethernet segment adjacent to the serial line that connects AT&T Labs Research to the Internet Same trace collection as in the modem case T1 link (1.5 Mbits/s) Raw packet trace collected on a dedicated 133 MHz Intel Pentium PC running on Linux

175 Research Tracing Environment - Continuation 11 days in Mid February, K clients accessed 23K external Web servers Trace subset used to study overheads in flow switched networks

176 The Simulator PROXIM simulates 3 scenarios: Using a proxy Not using a proxy (bandwidth to client is the bottleneck) Not using a proxy (bandwidth on the network connecting to clients of he Internet is the bottleneck)

177 The Simulator - Continuation Fig 3

178 Simulated Cache PROXIM simulates a document cache managed by LRU replacement

179 Network Connections Interactions between HTTP & TCP PROXIM - Each client maintains zero or more open connections to the proxy Idle connections - No HTTP requests currently active

180 Network Connection - Continuation Persistent HTTP requests generated for an idle connection - idle connection is chooses Non-persistent requests results in a new connection Clients generate a request and all of it s proxy connection are serving other requests - a new TCP connection is created by the client

181 Network Connection - Continuation Client/proxy requests open more than 3 minutes are closed by the proxy Proxy to content-provider idle for more than 30 seconds are timed-out

182 Document Transfer PROXIM schedules individual packets to the client, proxy or the server Defaults: 155K byte packets sent by proxy and servers No packets nor loss due to slow start considered

183 Packets Schedule Destined for clients Queued on client when using proxy or no- proxy (when client is the bottleneck) Queued on the Network when no-proxy (bandwidth to the internet is the bottleneck) Destined for the proxy are enqueued in a network queue

184 Default Rates Modem trace Server to proxy rate of 45 MBPS (T3) Client queues - 21 KBPS (maximum modem throughput) Research trace Server to proxy rate of 1.5 MBPS (T1) Network queue of 1.5Mbs

185 Packets Round Trip Time Constant: Client to proxy (Modem0 is 250 MS (modem delay) Proxy to server Difference between SYN & SYN-ACK time stamps or requests & responses time stamps Client to server Adds proxy to server with 250MS for modem trace For research trace MS between clients and proxy

186 Latency Calculations PROXIM simulates the overall latency to retrieve a document by breaking the overhead latency into: Connection setup time HTTP response time Document transfer time

187 Simulation Validation Results are compared for no-proxy simulations against the original latencies from trace

188 Simulation Validation - Continuation Fig 4

189 Performance Effects of the Proxy Hit Ratio Secondary measure for this study: Bandwidth savings Latency reduction Unlike other studies, traces captures all HTTP headers of requests & responses

190 Reasons for a document to be Uncachable Dynamically generated content Explicit cache control (Expirations0 A cookie present An authorization field is present in the request ISP trace shoed => 30% of all requests had a cookie (cookies are increasing)

191 Bandwidth Consumption Warning! Proxy can actually increase the traffic from content providers to the ISP! Effect on the overall bandwidth demands depends on the handling of aborted requests by the proxy

192 Measurements Total # of bytes received by clients from content provider without proxy GB If proxy continues downloading after client abandon, total # of bytes =58.7GB (118% of the original # of bytes) Proxy could abort the download after client abort

193 Partial Conclusion Proxy x controlled aborts Bandwidth mismatch between modules & Internet contributes to the wasted bandwidth consumption from aborts If the proxy aborts, right away 14% bandwidth savings is observed

194 Latency Reduction Caching connections versus caching data Multiple components within a typical HTTP request contribute to the latency of that request

195 Latency Components Client establishes a TCP connection to server Client sends the HTTP request for Document Server sends HTTP responses of document Server sends the data of document Server/client closes the TCP connection

196 Persistent Connections Skips steps (1) and (5) of previous slide Modem trace 18% persistent connections Research trace 18% persistent connections

197 What Causes Latency? Connection creations - end to end primary cause of latency Connection setup time least 50% of the total download time

198 Partial Conclusion Persistent connections between client/server could minimize high cost of TCP connection setup

199 Issues Content providers can maintain only a limited # o persistent connections Proxy (connection cache) could be beneficial

200 Modem Trace Latency Analysis Caching reduces latencies experienced by the user only by a minimum margin Very pessimist results! Due to heterogeneous bandwidth environment which implies high latency of the connection setup

201 Workaround If proxy is used as a connection cache latencies are shortened Combining data & connection proxy shortens transfers time (best option)

202 Modem Trace - Caching Connections versus Caching Data Connection cashing documents the total latency improvements for the heterogeneous bandwidth environment Considered Scenarios: All persistent Only Web proxy to Web server persists Only Web client to Web Proxy persistent None persistent

203 Summary Combining persistent connections with data cache is ideal No persistent connection & no data caching implies increased latency Persistent connections between clients & Wen proxy are not as effective as between Web proxy & Web server

204 Summary - Continuation Performance improvements of persistent TCP connections only between Web server & Web proxy are larger than introducing data cache to documents without cookies without persistent connections

205 Summary - Continuation Adding persistent client connections to caching improves the latency results by a factor of 2 Adding all persistent connections improves latency results over data caching only by more than a factor of 7.5

206 Research Trace - Latency Analysis Shows that connection caching is highly beneficial in a heterogeneous bandwidth environment Latency improvement of 7% for the pure data cache proxy Factor of 2 improvement over the modem environment

207 Research Trace Caching connection versus caching data For no-cache/no persistent connections, a substantial slow down occurs May slow down Web access

208 Partial Summary Research & modem traces shows that in a heterogeneous bandwidth environment Connection caching& data caching are complementary and produces performance improvements over 40%

209 Connection Cache Size Requires better policies for managing the connection cache When to tear down connections Which connections to tear down How many concurrent connections should be maintained

Final Summary Latency reductions from caching persistent connection can be greater than those ROM caching data Two types of caching are complementary and should

210 Final Summary Latency reductions from caching persistent connection can be greater than those ROM caching data Two types of caching are complementary and should be used together Configure proxy cache Off set can happen by aborted connections Hit ratios can be reduced by the growing percentage of documents containing cookies

Removal Policies in Network Caches for World-Wide Web Documents

Removal Policies in Network Caches for World-Wide Web Documents Stephen Williams, Marc Abrams (Computer Science, Virginia Tech, Blacksburg, VA 2461-16) Charles R. Standridge (Industrial Eng., FAMU-FSU