Dynamic Grouping Strategy in Cloud Computing

IEEE CGC 2012, November 1-3, Xiangtan, Hunan, China Dynamic Grouping Strategy in Cloud Computing Presenter : Qin Liu a Joint work with Yuhong Guo b, Jie Wu b, and Guojun Wang a a Central South University, China b Temple University, USA 1

Outline 1. 1. Introduction 2. 2. Preliminaries 3. 3. K-Mean-based Dynamic Grouping 4. 4. Extensions 5. 5. Evaluation 6. 6. Conclusion & Future work 2

Introduction 3

Cloud computing model o Cloud computing has emerged as a new type of commercial paradigm due to its overwhelming advantages, such as flexibility, scalability, and cost efficiency F 1 : 1 :{A, B} B} A, B Cloud F 2 : 2 :{B, D} D} Bob F 1 F 1 2 2 F 3 : 3 :{C, D} D} 4

Application scenario University A Cloud Online library Access right Online library 5 Staffs &students

Problem: Cost grows linearly University A Alice {A,B} {F1,F2,F4} Redundant files: F1,F3: 2 F2,F4: 3 Bob Clark {A} {F1,F2} {C,D} {F2,F3,F4} {C} {F3,F4} 6 Donland

New chanllenge Alice University A {A,B} Deploy proxies inside Aggregate user queries Distribute search results Bob {F1, F2, F4} {A} {F1, F2} Proxy 1 {A,B} {F1, F2, F4} 7 Clark {C} {C, D} {F2, F3, F4} Proxy 2 Donland Return 6 files How to group queries to minimize the returned files? {F3, F4} {C, D} {F2, F3, F4}

Naïve solution: Random grouping Alice Clark Bob Donland University A {A,B} {F1, F2, F4} {C, D} {F2, F3, F4} {A} {F1, F2} {C} {F3, F4} Proxy 1 Proxy 2 Problems: Random grouping cause a waste of bandwidth {A,B} {F1, F2, F3 F4} {C, D} {F1,F2, F3, F4} Return 8 files Waste 20% bandwidth 8

Naïve solution: One proxy Alice University A {A,B} {F1, F2, F4} Problem: One proxy causes performance bottleneck and single point of failure Clark Bob Donland {C, D} {F2, F3, F4} {A} {F1, F2} {C} {F3, F4} Proxy {A,B,C, D} {F1,F2,F3,F4} Return 4 files Best performance 9

K-Mean-based Dynamic Grouping (KMDG) Classify n queries into k groups in the case of k proxy servers, so that each group size equals to n/k and the number of returned files is minimized. NP-Hard problem Heuristic grouping strategy Basic strategy: KMDG (based on K-Means) Extensions: KMDG1-Robust version KMDG2-Relax the constraint of equal group size 10

Design goals Effectiveness Cost efficiency Obtain optimal results within a polynomial time Load balancing Balance the bandwidth among proxies Minimize the bandwidth at at the cloud Robustness Obtain search results even if if some machines fail 11

Preliminaries 12

System model The cloud, many users, and many proxy servers Query router (QR) Aggregation and distribution machines (ADMs) 13

System model Given a public dictionary consisting of all keywords University A A user query/a combined query is a 0-1 bit string Alice Clark Bob {A,B} <1100> {C, D} <0011> {A} <1000> {C} <0010> Proxy {A,B,C, D} <1111> Dictionary=<A,B,C D> 14 Donland

Problem formulation Assumption Keywords are uniformly distributed in the file set The probability of each keyword in a file is the same 15 Minimize the total returned files = Minimize the number of keywords in combined queries Group queries with the most common keywords together Minimize the total number of 1s in the combined queries

Parameter analysis The expected value of the number of returned files can be calculated with Eq.1 d: the number of keywords in the dictionary γ: the average number of keywords in a file k: the number of groups/proxies t: the number of files in the cloud (grouping cost) : the average number of keywords in the j-th combined query (1) 16

K-Mean-based Dynamic Grouping (KMDG) 17

Definitions 18

High level ideas Suppose each query denotes a node 19

High level ideas An improvement: First choose a random node as the seed. For the i-th seed, choose the one with the total maximal distance with all i-1 seeds 20

High level ideas Closet: the minimal number of increased 1s after being combined with the seed s1=1100, s2=0011, s3=1001, s4=0101; Q=1100 21

High level ideas After choosing k seeds, grouping process is performed in the same way in next round 22

Algorithm 23

Example Sample Queries 24

Extensions 25

KMDG1-Algorithm Each user generates 2 α k query copies with the constraint that α copies are in different groups 26

KMDG1-Example 27

KMDG2-Algorithm Relax the constraint of equal group size to further reduce bandwidth 28

KMDG2-Example 29

Evaluation 30

Parameters Simulations are conducted with MATLAB R2010a, running on a local machine with an Intel Core 2 Duo E8400 3.0 GHz CPU and 8 GB RAM Summary of parameters 31

Performance Comparison of total number of 1s. X-axis denotes the number of users and Y-axis denotes the total number of 1s. 32

Performance Comparison of bandwidth at the cloud. X-axis denotes the number of users and Y-axis denotes bandwidth at the cloud (MB) 33

Load balancing Comparison of imbalanced transfer-in bandwidth. X-axis denotes the number of users and Y-axis denotes the imbalanced transfer-in bandwidth (MB) 34

Load balancing Comparison of imbalanced transfer-out bandwidth. X-axis denotes the number of users and Y-axis denotes the imbalanced transfer-out bandwidth (MB) 35

Conclusion & Future work 1 A dynamic grouping strategy is proposed to achieve cost efficiency, load balancing, and robustness in cloud computing 2 Experiment results show that KMDG can largely reduce the bandwidth incurred at the cloud 3 Conduct experiments on other keyword distributions to verify the effectiveness of KMDG 36

37 Thank you!