ix TABLE OF CONTENTS CHAPTER NO. TITLE PAGE NO. ABSTRACT 5 LIST OF TABLES xv LIST OF FIGURES xviii LIST OF SYMBOLS AND ABBREVIATIONS xxi 1 INTRODUCTION 1 1.1 INTRODUCTION 1 1.2 WEB CACHING 2 1.2.1 Classification of Web Cache 4 1.2.2 Cache Replacement Algorithms 5 1.2.3 Properties of WWW Caching System 6 1.3 PREFETCHING 8 1.3.1 Classification of Prefetching Algorithms 9 1.4 CO-OPERATIVE CACHING 10 1.5 OBJECTIVES 11 1.6 PROPOSED SYSTEM ARCHITECTURE 12 1.6.1 Access Log Manager (ALM) with SVM Classifier 13 1.6.2 Cluster Based Proxy Cache Manager (PCM) 13 1.6.3 Dynamic Hash Table (DHT) Based Co-operative Client Cache Manager System (CCM) 14 1.7 PERFORMANCE METRICS 15
x CHAPTER NO. TITLE PAGE NO. 1.7.1 Web Caching Performance Metrics 15 1.7.2 Prefetch Metrics 16 1.8 THESIS ORGANIZATION 18 2 LITERATURE REVIEW 20 2.1 INTRODUCTION 20 2.2 WEB CACHING 20 2.2.1 Web Caching Algorithms 21 2.2.2 Studies Based on Web Caching 23 2.3 PREFETCHING 26 2.3.1 Types of Web Prefetching 26 2.3.2 Approaches to Web Prefetching 27 2.3.3 Studies on Web Prefetching Techniques 28 2.3.4 Clustering Based Prefetching 29 2.4 INTEGRATING WEB CACHING AND PREFETCHING TECHNIQUES 32 2.5 CO-OPERATIVE CACHING 35 2.5.1 Co-operative Caching Mechanisms 35 2.5.2 Co-operative Caching Algorithms 36 2.5.3 Studies Based on Co-operative Caching 38 2.6 SUMMARY 44 3 ACCESS LOG MANAGER (ALM) 45 3.1 INTRODUCTION 45 3.2 INTRODUCTION TO ACCESS LOG MANAGER 45 3.3 ACCESS LOG MANAGER PROCESS 47 3.4 SVM CLASSIFIER 48 3.5 FRAMEWORK FOR GENERATING INPUT DATASET USING ALM 50
xi CHAPTER NO. TITLE PAGE NO. 3.5.1 Raw Data Collection 51 3.6 OFFLINE COMPONENT OF ACCESS LOG MANAGER 52 3.6.1 Sample Log File 53 3.6.2 Log File Contents 53 3.7 DATA PRE-PROCESSING 54 3.8 DATA CLEANING 56 3.9 TRAINING PHASE 57 3.10 ALM PERFORMANCE 60 3.10.1 Classifier Metrics 61 3.10.2 Web Cache Performance Measures 62 3.11 SUMMARY 64 4 CLUSTER BASED PROXY CACHE MANAGER 65 4.1 INTRODUCTION 65 4.2 FRAMEWORK OF PROXY CACHE MANAGER 65 4.2.1 Authentication Manager for Proxy Cache System 67 4.2.2 Constructing Web Navigational Graph (WNG) 70 4.2.3 Association Rule Mining 71 4.2.4 Inter Custer Creation Algorithm for Proxy Cache System 72 4.3 IMPACT OF SVM FOR VARIOUS CONFIDENCES AND SUPPORT THRESHOLDS 74 4.4 CLUSTER BASED PREDICTION AND PREFETCHING 76
xii CHAPTER NO. TITLE PAGE NO. 4.4.1 Hybrid CRF Algorithm for Cache Replacement 77 4.4.2 Performance Evaluation of CRF Algorithm 79 4.5 IMPACT OF CACHE SIZE ON PERFORMANCE MEASURES 79 4.6 SUMMARY 82 5 CO-OPERATIVE CLIENT CACHE MANAGER SYSTEM 83 5.1 INTRODUCTION 83 5.2 CO-OPERATIVE HYBRID ARCHITECTURE 84 5.3 DHT BASED CO-OPERATIVE CLIENT CACHE MANAGER SYSTEM (CCM) 86 5.3.1 Identifier Creation by Hashing Algorithm 87 5.3.1.1 Finger table 91 5.3.1.2 Key value pair table 94 5.3.2 Node Create and Join Procedures 97 5.3.3 Routing Algorithm 100 5.3.4 Node Stabilize Algorithm 100 5.3.5 Resource Searching Algorithm 103 5.3.6 Node Exit Algorithm 105 5.4 QUERY INTEGRATOR 107 5.5 CLIENT CACHE MANAGEMENT 111 5.6 SUMMARY 112 6 PERFORMANCE EVALUATION AND ANALYSIS 116
xiii 6.1 INTRODUCTION 116 CHAPTER NO. TITLE PAGE NO. 6.2 EXPERIMENTAL SETUP AND IMPLEMENTATION 117 6.3 ACCESS LOG MANAGER PERFORMANCE ANALYSIS 118 6.4 CLASSIFIER EVALUATION 119 6.5 EFFECTIVENESS OF THE WEB CACHING 120 6.5.1 SOS Management through Hybrid Algorithm 121 6.5.2 Cluster Based Proxy Server Cache Management through Combined LRU and LFU Algorithm 123 6.5.2.1 Efficiency improvement by CRF in sample data sets 124 6.6 THE EFFECTIVENESS OF PREFETCHING 126 6.7 AVERAGE NETWORK TRAFFIC 127 6.8 IMPACT OF CONFIDENCE AND SUPPORT THRESHOLD 129 6.8.1 Analysis of Precision and Recall for Different Support Values 136 6.9 IMPACT OF CACHE SIZE ON PERFORMANCE MEASURES 136 6.10 PERFORMANCE ANALYSIS OF SIMULATION 138 6.11 COMPARISON OF DEVELOPED APPROACH WITH EXISTING APPROACHES 139 6.12 SUMMARY 143
xiv CHAPTER NO. TITLE PAGE NO. 7 CONCLUSION AND FUTURE WORKS 144 7.1 SUMMARY 144 7.2 MAJOR FINDINGS 145 7.3 CONCLUSION 146 7.4 SCOPE FOR FUTURE WORK 147 APPENDIX 1 148 APPENDIX 2 151 REFERENCES 156 LIST OF PUBLICATIONS 162
xv LIST OF TABLES TABLE NO. TITLE PAGE NO. 3.1 Details of proxy server 52 3.2 Resultant preprocessed data 56 3.3 Algorithm for removing irrelevant records 57 3.4 SVM efficiency for various values of C and 60 3.5 SVM classification algorithm 60 3.6 Classifier measures of testing datasets 62 3.7 Statistics of proxy datasets after SVM classification 62 3.8 No. of web objects Vs classification efficiency 63 4.1 Algorithm for cluster creation 73 4.2 Impact of cache size on hit ratio for support 2 and confidence 0.3 74 4.3 Impact of cache size on hit ratio for support 4 and confidence 0.3 74 4.4 Impact of cache size on hit ratio for support 6 and confidence 0.3 75 4.5 Impact of cache size on hit ratio for support 8 and confidence 0.3 75 4.6 Pseudo-code for prediction and prefetching 77 4.7 CRF algorithm 79 4.8 Impact of cache size on hit ratio for proxy data set UC 80 4.9 Impact of cache size on hit ratio for proxy data set BO2 80 4.10 Impact of cache size on hit ratio for proxy data 81
xvi TABLE NO. TITLE PAGE NO. set SV 4.11 Impact of cache size on hit ratio for proxy data set SD 81 4.12 Impact of cache size on hit ratio for proxy data set NY 82 5.1 Object ID Table with sample clients 89 5.2 Finger Table for node C1 91 5.3 Finger Table for node C3 91 5.4 Finger Table for node C5 92 5.5 Finger Table for node C8 92 5.6 Finger Table for node C20 92 5.7 Finger Table for node C24 93 5.8 Finger Table for node C28 93 5.9 Finger Table for node C30 93 5.10 Key Value Pair Table for Node C1 95 5.11 Key Value Pair Table for Node C3 95 5.12 Key Value Pair Table for Node C5 95 5.13 Key Value Pair Table for Node C8 95 5.14 Key Value Pair Table for Node C20 96 5.15 Key Value Pair Table for Node C24 96 5.16 Key Value Pair Table for Node C28 96 5.17 Key Value Pair Table for Node C30 97 5.18 Object ID table after joining node C16 98 5.19 Object ID table after deleting node C8 105 5.20 Comparison of recency and frequency wise retrieval with proposed system 114 6.1 Average network traffic 128 6.2 CPU utilization and server load based on proxy 129
xvii TABLE NO. TITLE PAGE NO. presence 6.3 Improvement ratio comparison of SVM hybrid with other methods 137 6.4 Performance analysis of proposed system 141 6.5 Cache performance results 142
xviii LIST OF FIGURES FIGURE NO. TITLE PAGE NO. 1.1 Storage hierarchy 3 1.2 Classification of web cache 5 1.3 Classification of prediction algorithms 10 1.4 Co-operative web caching system 11 1.5 System architecture for information retrieval 12 3.1 Access Log Manager Process 48 3.2 Classifications of data by SVM 49 3.3 Training data classification by machine learning Technique 51 3.4 Snapshot of sample proxy server log 53 4.1 Framework of cluster based proxy cache manager 67 4.2 Client registration 68 4.3 Client registration failure 68 4.4 Snapshot of sample proxy cache content 69 4.5 Snapshot of frequency updated sample cache content optimization 70 4.6 Example Web Navigational Graph 71 5.1 Client Cache Manager Modules integration process 84 5.2 Hybrid architeuctue uesd in the system 85 5.3 Co-operative chord network for shared objects with finger table 90 5.4 Co-operative chord network after joining C16 99 5.5 Co-operative chord network after relieving C8 106
xix FIGURE NO. TITLE PAGE NO. 5.6 Snapshot of query integrator 108 5.7 SOS object creation 109 5.8 Work break down structure 113 6.1 Classification efficiency Comparison 119 6.2 Redundancy rate comparison with web objects 120 6.3 Analysis of hit ratio 122 6.4 Analysis of byte hit ratio 123 6.5 Comparison of cache hit ratio in uc data set 124 6.6 Comparison of cache hit ratio in bo2 data set 124 6.7 Comparison of cache hit ratio in sv data set 125 6.8 Comparison of cache hit ratio in sd data set 125 6.9 Comparison of cache hit ratio in ny data set 126 6.10 Precision and Recall percentages for all five data sets 127 6.11 Hit ratio Vs cache size for Support :2 Confidence:0.3 131 6.12 Hit ratio Vs cache size for Support :4 Confidence:0.3 131 6.13 Hit ratio Vs cache size for Support :6 Confidence:0.3 132 6.14 Hit ratio Vs cache size for Support :8 Confidence:0.3 132 6.15 Byte hit ratio Vs cache size for Support :2 Confidence:0.3 133 6.16 Byte hit ratio Vs cache size for Support :4 Confidence:0.3 133 6.17 Byte hit ratio Vs cache size for Support :6 Confidence:0.3 134 6.18 Byte hit ratio Vs cache size for Support :8 134
xx FIGURE NO. TITLE PAGE NO. Confidence:0.3 6.19 Analysis of access latency 135 6.20 Precision and Recall analysis 136 6.21 Breakdown of request handling 139 6.22 Comparison of access latency Vs number of Peers 140 6.23 Performance comparison of client server and developed system 140 6.24 Performance analysis of proposed system 142
xxi LIST OF SYMBOLS AND ABBREVIATIONS APACS - A Proxy Agent for Client System ALM - Access Log Manager AI - Artificial Network ANN - Artificial Neural Network ANFIS - Artificial Neuro Fuzzy Information System AS - Autonomous System BPNN - Back Propagation Neural Network BHR - Byte Hit Ratio CCM - Client Cache Manager CRF - Combined Recency Frequency CCN - Content Centric Network CON - Content Oriented Network CP - Content Provider CCR - Correct Classification Rate DG - Dependency Graph DNS - Domain Name Server DDG - Double Dependency Graph DHT - Dynamic Hash Table EC - End Consumer XML - Extensible Markup Language FFS - Fast Frequency server FI - Finite Inductive FIFO - First In First Out GA - Genetic Algorithm GM - Geometric Mean GDS - Greedy Dual Size
xxii GDFS - Greedy Dual Size Frequency HR - Hit Ratio HTML - Hyper Text Markup Language ID - Identifier IP - Internet Protocol ISP - Internet Service Provider LFU - Least Frequently Used LRU - Least Recently Used LAC - Local Access Counter MRU - Most Recently Used NLANR - National Laboratory of Applied Network Research NN - Neural Network PSO - Particle Swarm Optimization P2P - Peer to Peer PPE - Prediction Prefetching Engine PCM - Proxy Cache Manager RR - Random Replacement RS - Rough Set RTT - Round Trip Time SHA1 - Secure Hashing Algorithm 1 SAC - Sharable Access Counter SOS - Sharable Object Space SWNET - Social Wireless Network SV - Support Vector SVM - Support Vector Machine TTL - Time To Live TNR - True Negative Rate TPR - True Positive Rate URL - Uniform Resource Locator
xxiii WNG - Web Navigational Graph WSE - Web Search Engine WAN - Wide Area Network WWW - World Wide Web