A SPECTRAL METHOD FOR NETWORK CACHE PLACEMENT BASED ON COMMUTE TIME

Size: px
Start display at page:

Download "A SPECTRAL METHOD FOR NETWORK CACHE PLACEMENT BASED ON COMMUTE TIME"

Transcription

1 A SPECTRAL METHOD FOR NETWORK CACHE PLACEMENT BASED ON COMMUTE TIME By PRIYANKA SINHA A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE UNIVERSITY OF FLORIDA 2013

2 c 2013 Priyanka Sinha 2

3 To my parents (Mr. Pradip Sinha and Mrs. Rekha Sinha), and my uncle (Mr. Shakti Chatterjee) 3

4 ACKNOWLEDGMENTS This thesis would not have been possible without the guidance and the help of Dr. John M. Shea. I would like to thank him for his encouragement and his valuable guidance in the preparation and completion of this work. I would also like to thank my family and friends for all their invaluable support. 4

5 TABLE OF CONTENTS page ACKNOWLEDGMENTS LIST OF TABLES LIST OF FIGURES ABSTRACT CHAPTER 1 INTRODUCTION Problem Overview Literature Review Schemes that Aim to Improve Data Access Efficiency Improve Data Access Efficiency with Single Data Item: Improve Data Access Efficiency with Multiple Data Items: Schemes that Aim to Improve Energy Consumption: Contribution and Organization of this Thesis SYSTEM MODEL AND PROBLEM FORMULATION System Model Topology Link Model Problem Formulation Expected Commute Time Using Spectral Embedding to Express Commute Time as Euclidean Distance Optimality Criterion CLUSTERING ALGORITHM Selection of Algorithm Partitioning Around Medoids (PAM) Algorithm PAM for Cache Selection PAM Parameters and Performance Motivating Example NETWORK SIMULATION Simulation Results CONCLUSION

6 REFERENCES BIOGRAPHICAL SKETCH

7 Table LIST OF TABLES page 4-1 Simulation parameters

8 Figure LIST OF FIGURES page 2-1 The Gilbert-Elliot model Minimum average commute time over multiple runs Average commute time between vertices and medoids as a function of number of iterations in the PAM clustering algorithm Clusters formed by distance-based PAM clustering Clusters formed by commute-time based PAM clustering Average access latency vs p gb for p bg = 0.8 and routing frequency = Average access latency vs p gb for p bg = 0.6 and routing frequency = Average access latency vs p gb for p bg = 0.2 and routing frequency = Average access latency vs p gb for p bg = 0.8 and routing frequency = Average access latency vs p gb for p bg = 0.8 and routing frequency =

9 Abstract of Thesis Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Science A SPECTRAL METHOD FOR NETWORK CACHE PLACEMENT BASED ON COMMUTE TIME By Priyanka Sinha May 2013 Chair: John M. Shea Major: Electrical and Computer Engineering Information caching in networks can be used to reduce latency and increase reliability in accessing information and also to reduce network traffic. In wireless networks, changing channel conditions may impact the availability of caches, and this should be taken into account when determining where in the networks caches will be placed. In this thesis, we investigate this problem and propose that expected commute time between a nodes and its corresponding cache is a good measure to optimize because it takes into account both the distances between the node and the cache and the number of paths between the node and the cache. We then develop an efficient way to place the caches using spectral clustering. The performance of cache placement based on expected commute time is compared to the performance of cache placement based on Euclidean distance The results show that for most of network topologies, the commute-time based clustering outperforms provides better access latency than distance-based clustering. 9

10 CHAPTER 1 INTRODUCTION 1.1 Problem Overview In communications networks, information may be cached at locations throughout the network to reduce the cost in accessing that information. For instance, by caching the information close to the nodes accessing it, the information can generally be accessed at lower latency, higher speed, and with higher reliability. In the Internet, this approach has spawned the creation of content delivery companies, such as Akamai. Caching is also important in military networks, and especially in distributed wireless networks in which communications over long routes are prone to failure. In this thesis, we focus on the case of cache placement in wireless ad hoc networks, and the discussion will be based on this scenario. However, most of the ideas and techniques are applicable to other wireless and wired communications networks with minimal modification. An important consideration is where to place caches in the network. The caches should be distributed throughout the network so that the information is easily accessible by the other nodes in the network. Here, easily accessible may include various criteria, such as short paths from a node to the nearest cache. Given such a path, additional criteria could be that it have high reliability, high capacity, or lower congestion. Reliability can be further enhanced if the cache is accessible via multiple possible routes. In this thesis, we propose to use techniques from spectral graph theory to optimize the placement of caches in a distributed wireless network, with dynamic link states. Information is cached at a set of nodes that minimizes the expected commute time to nearby nodes that may access the caches. As we discuss further below, expected commute time between two nodes is a measure that not only decreases both when the length of any path connecting the two nodes decreases but also decreases when the number of paths connecting two nodes increases. In addition, by using a spectral embedding of the node adjacency information into a high-dimensional Euclidean 10

11 space, the expected commute time among nodes in a graph can be calculated using Euclidean distance. This latter approach allows for caches to be placed using simple representative-based clustering algorithms and also allows for more efficient, approximate optimization of the expected commute time by embedding the nodes of the graph in a lower-dimensional Euclidean space. Results are presented to demonstrate the effectiveness of the algorithms. 1.2 Literature Review Since wireless networks have limited communication bandwidth, data caching may be a useful approach to improve the efficiency of the data access. A number of recent and past works have tackled the problem of cache placement in wireless networks, and they can be broadly categorized based on the following criteria: 1. Optimization objective: (a) (b) (c) Improve the data access efficiency Improve the energy consumption Improve the rate of utilization and cache-hit ratio 2. Number of data items in network: (a) (b) Cache placement in a network with single data item Cache placement in a network with multiple data items: The class of problems with multiple data items, can again be classified on basis of the size of data: i. Uniform-size data items ii. Nonuniform-size data items 3. Optimization approach: Optimal cache placement for a network with a general graph topology and a single type of data item is generally formulated as one of two different graph-theory problems: (a) (b) In the facility location problem, the goal is to minimize the sum of the total costs (cache setup cost + access cost) incurred due to caching at each node in a certain cache placement, without any constraint, and In the k-median problem, the goal is to minimize the total access cost with a maximum of k cache nodes. 11

12 Cache placement problems can be further classified in terms of the complexity of the algorithm that solves them. For instance, both the facility location and k-median problems are NP-hard, meaning an algorithm for solving it can be translated into one for solving any NP-problem (nondeterministic polynomial time problem). There are certain works that are formulated as APX-hard problems (approximable problems that does not have a polynomial time approximation scheme) like [1]. But in order to be able to find a solution to these cache placement problems, we must overcome the hardness or the nonapproxamibility. Several NP-hard or APX-hard problems have been solved using constant-factor approximation algorithms (polynomial-time approximation algorithms with approximation ratio bounded by a constant) after circumventing the hardness or the nonapproximability. Example: [1] overcomes a nonapproximability problem by choosing to maximize the reduction in total access cost instead of minimizing the total access cost. Several works overcome the hardness of a NP-hard problem by considering tree networks instead of general graph topologies, [2], [3], [4], [5]. 4. Centralized vs. distributed: A distributed algorithm has the advantage of being implementable in a network with dynamic traffic over a centralized one. 5. Memory constraint: Some of the existing works take into consideration that the nodes in the network may have limited memory and hence pose the problem with a memory constraint [6]. In other work, memory is not considered a constraint. Since these classification criteria are overlapping, we use the first criterion (optimization objective) as our primary criterion in describing the existing literature on cache placement in wireless networks. 1.3 Schemes that Aim to Improve Data Access Efficiency Improve Data Access Efficiency with Single Data Item: In [7], the cache placement problem is posed as a trade-off between over-head cost due to cache placement and access latency. A polynomial time algorithm is designed to approximately solve the NP-hard problem of minimizing the weighted sum 12

13 of overhead cost and access latency. The algorithm can be implemented in a distributed and asynchronous fashion. In [8], a hybrid cache-placement scheme is developed that carries out an optimal tradeoff between the dissemination and access overhead cost and the access latency. The proposed scheme uses a routing navigational graph that figures out the potential relationship among the nodes in the routing paths, using the current data access patterns, and a clustering strategy to partition the multihop wireless network to pick suitable nodes for cache placement from a set of nodes related to the users application. This approach helps the cache placement scheme be adaptive to changes in data access patterns while minimizing the number of cache nodes. The scheme results in a smaller overhead cost than flooding and achieves a significant improvement when the number of nodes is large. In [1] an evolutionary approach has been proposed for finding an optimal web proxy cache placement that minimizes the average response time for accessing the web content. When compared to the traditional approaches like dynamic programming and packet level simulation, the evolutionary approach is said to have similar results as packet-level simulation for simple networks, while being computationally faster. The evolutionary algorithm handles large scale networks equally well as the dynamic programming approach. Optimizing the cache placement to tradeoff between the total traffic cost and average access delay in wireless multi-hop ad hoc networks is considered in [2]. Since dynamic network topologies are considered the approach is called Dynamic Cache Placement (DCP). Unlike other data access efficiency optimization problems, DCP takes the impact of contentions in the wireless networks into account: hop counts, which are often used to measure the total cost of caching, result in different performances depending on the contention/traffic loads on the paths. Three kind of traffic flows are considered in the caching system: Access Flow: traversal of nodes for data access, Reply Flow (RF): Traversal of nodes for replying to data requests from cache nodes, and Update Flow: traversal of nodes for updating cache 13

14 content. DCP aims to select candidate nodes for cache placement so as to reduce the access traffic flows and increase the update traffic flows to the best of the possibilities and select cache nodes with fewer contentions from the candidates. In [9] an effective and low cost cache placement scheme for mobile P2P networks is proposed, along with a scheme to update the cache placement as the network evolves. Both schemes are implementable in a decentralized manner. In [10], a heuristic cache-distribution algorithm is developed that aims at improving document download latency by improving the over network latency. This scheme estimates the traffic at each cache of a mesh-network and based on the traffic, each cache is assigned a suitable percentage of the total storage capacity of the network. Refs. [11] and [12] design optimal dynamic programming polynomial algorithms for solving k-median problems in undirected and directed trees, respectively. In other works, [13] considers the placement of k transparent caches, [14] considers a cost model involving reads, writes, and storage, and [15] present a distributed algorithm for sensor networks to reduce the total power expended Improve Data Access Efficiency with Multiple Data Items: Optimizing cache placement in ad hoc network with multiple types of data items is the focus of [16], in which three different algorithms are proposed. In the first, each node caches the items most frequently accessed by it. The second approach eliminates replications among neighboring nodes introduced by the first approach. The third approach requires creation of stable groups to gather neighborhood information and determine caching placements. The approach in [16] is extended in [17] and [18] by generalizing the above approaches for push-based systems and updates, respectively. Here, [17] improves uses a push-based approach to shorten the average response time for data access, and [18] tries to improve data accessibility for systems in which the data items are updated periodically. 14

15 Several other references also consider cache placement with multiple data type. Ref. [19] suggests transparent replica placement in tree networks to minimize total data transfer cost. To support data access in a multiple data item environment, [3] devises three simple distributed caching techniques: CacheData (caches data items that are passing by), CachePath (caches the path to the nearest cache of the passing-by data item), and HybridCache (which caches the data item if its size is small enough or the path to the data otherwise). They use LRU (least recently used) policy for cache replacement. Ref. [20], proposes a 20.5-approximation non-distributed (where distributed implementation is not possible) algorithm for a non-apx optimal cache placement with uniform-size multiple data items, as no polynomial-time solution exists for the nonuniform-size data items. However, their approach (as noted by themselves) is not amenable to an efficient distributed implementation. Ref. [6] is a similar work that minimizes the total data access cost in ad hoc networks with multiple uniform-size (generalizable to non-uniform size data items) data items and nodes with limited memory capacity. A centralized tractable algorithm with a provable performance bound is developed. The algorithm is also suitable to a natural distributed implementation. Namely, a centralized 4-approximation algorithm (2-approximation for uniform-size data items), and a localized distributed algorithm, based on the approximation algorithm and capable of handling mobility of nodes and dynamic traffic conditions have been devised. In [21], a data caching algorithm is proposed for ad hoc networks with multiple data items and whose nodes exchange information items in a peer-to-peer manner. At each node, upon receiving requested information, it determines the cache drop time of the information or which content to replace for the newly arrived information. A near optimal cache placement is proposed to maximize reduction in overall access cost while meeting the limited memory constraint, which in turn leads to better bandwidth usage and energy savings. The algorithms proposed in this paper are both analytically tractable with a provable performance bound in a centralized setting and are also amenable to a 15

16 natural distributed implementation. In [22] an effective and low cost cache placement strategy, combined with an update scheme, has been proposed which is suitable for decentralized implementation in a mobile peer-to-peer network. This paper also compares its placement and update scheme with various placement-only schemes like Global Benefit Based Cache Placement (GBCP), Local Benefit Based Cache Placement (LBCP) and Cluster Based Cache Placement (CBCP), and Random Placement (RAND) and establishes that a combination of placement and update does better than the other three placement schemes in terms of average hop count required to transmit a segment of data. 1.4 Schemes that Aim to Improve Energy Consumption: In [4], cache-placement algorithms are developed to minimize the overall access cost with an update cost constraint, thus reducing energy consumption and taking care of resource efficiency. Dynamic programming is used to solve the optimal cache-placement problem for tree topologies, and a polynomial time algorithm is developed to approximately solve the NP-hard cache placement problem for general graph topologies. Distributed implementations of these algorithms are also developed. In [5] a caching scheme that optimally trades-off between energy consumption and access latency in wireless ad hoc network is developed. The problem is a special case of the connected facility location problem, which is known to be NP-hard. A polynomial time algorithm for the same has been developed, which provides a sub-optimal solution in arbitrary network topologies. This algorithm can be implemented in a distributed and asynchronous manner. In the case of a tree topology, the algorithm gives optimal solution. An energy-conserving caching scheme for wireless sensor networks is developed in [23]. Finding the locations of the nodes for caching data to minimize communication cost corresponds to finding the nodes of a weighted Minimum Steiner tree whose edge weights depend on the edge s Euclidean length and its data traffic rate. This 16

17 tree is called a Steiner Data Caching Tree (SDCT). Expressions determining the exact location of a Steiner point for a set of three nodes based on their location are derived along with their data refresh rate requirements. Based on these (optimality) results, a dynamic, distributed, energy-conserving application-layer service for data caching and asynchronous multicast is presented. A review of the various data caching techniques in wireless sensor networks (WSNs) is presented in [24]. In [15], a distributed application-layer service for cache placement and asynchronous multicast in wireless sensor networks has been proposed for placing replicas of requested data items and updating them in such a manner so as to minimize the frequency of communication, which results in reduced communication overhead and hence reduced power consumption. 1.5 Contribution and Organization of this Thesis The existing work on cache placement focuses on networks in which the links are reliable. In wireless mesh and ad hoc networks, depending on the communication frequencies and mobility rates, the links may often experience outages because of multipath fading. Thus, in this thesis we focus on the design of a cache placement strategy to improve performance in the presence of link failures. The rest of this document is organized as follows. In chapter 2, the system model is presented, and the proposed metric for optimizing cache placement is presented. In chapter 3, we describe how spectral clustering algorithms can be used to approximate the optimal cache placements. In chapter 4, we describe a network simulation that was used to compare performance of the proposed cache placement algorithm with a reference algorithm, and performance results are presented to show the advantages of the cache placement algorithm we propose. Finally, in chapter 5, conclusions are drawn and possible extensions to this work are discussed. 17

18 CHAPTER 2 SYSTEM MODEL AND PROBLEM FORMULATION 2.1 System Model We consider a wireless network with static topology but time-varying communication links. This scenario can model a slowly moving ad hoc network over short time frames and is sufficient to demonstrate whether the proposed cache-placement techniques can improve performance in the presence of link quality fluctuations. For the purposes of this study, at any given time, communication over a link between two radios is either possible (the link is up) or not possible (the link is down). Links are assumed to transition between up and down according to a random process. Thus, we can characterize the network in terms of its topology and the link model Topology Consider first the full network topology, which consists of the set of communicators (nodes) along the set of links when all links are up. The full network topology can be represented by a simple weighted graph G = (V, E), where V is the set of vertices (representing the data or nodes in the network) and E is the set of edges connecting the vertices in V. For convenience, let N = V be the number of vertices, or nodes, in the network. We assume that G is a connected graph, which means that there is a path from any vertex to any other vertex. If an edge exists between two vertices v i and v j, then those vertices have a nonzero similarity or affinity measure, a ij 0 which is the weight assigned to that edge. Larger weights indicate that communication is easier between the nodes, in terms of an appropriate measure, such as throughput or reliability. The weights can be collected into a weighted adjacency matrix A = [a ij ], i, j = 1, 2,..., N. Here w ij = 0 if v i and v j do not share an edge or if i = j. The degree of vertex v i V is d i = N j=1 a ij. Let D be the diagonal matrix with D ii = d i. An important matrix that we will utilize later is the (unnormalized) Laplacian matrix for G, which is L = D A. 18

19 2.1.2 Link Model As previously mentioned, at any given time, a given communication link may either be up or down. For convenience, we divide time into slots and characterize the state of each edge G in each slot. We assume that the states for different links are independent, which may not necessarily be true in situations such as shadowing; however this will be true if the link quality is caused by fading in a rich multipath environment. For most situations that cause link quality to fluctuate, such as fading or shadowing, the link quality will not be independent from slot to slot. To model the dependence between slots, in this thesis, we use the Gilbert-Elliot channel model, which is based on a two-state discrete-time Markov chain. The two states are the good state and the bad state, where the link is up when the Markov chain is in the good state and the link is down when the Markov chain is in the bad state. A state diagram for the Gilbert-Elliot channel is shown in Figure 2-1. Figure 2-1. The Gilbert-Elliot model Let p gb denote the conditional probability that the next state is the bad state given that the current state is the good state. Similarly, let p bg denote the conditional probability that the next state is the good state given that the current state is the bad state. The Gilbert-Elliot model can be completely characterized by specifying the probabilities of transitioning to the opposite state. (The two remaining state transition probabilities are given by p bb = 1 p bg and p gg = 1 p gb ). The expected number of slots for which a particular link stays in a given state is known as the state sojourn time, which in turn depends on the transition probabilities given that the channel is in that 19

20 particular state. The state sojourn times for the good and bad state are T g = 1/p gb and T b = 1/p bg, respectively. 2.2 Problem Formulation We consider the problem of how to place K caches among the N nodes in the network to minimize the latency for the N nodes to access the cached data. Let C V be the subset of nodes at which data will be cached. We consider cache placement under the assumption that each node will access a single cache for which it has the smallest cost to access. In a wireless network, even if the links are reliable, the time to fulfill cache requests may be extremely difficult to characterize because of contention issues and queuing delays. Thus, we consider instead minimizing a cost function that encodes features that impact latency. For example, if the links are reliable, stable, and multi-path routing is not used, the cost may be the number of edges that must be traversed or the sum of a cost function computed from the weights on the edges (such as w 1 ij ). However, in networks with time-varying link quality, such measures may result in poor performance because they depend on a single route from the nodes to the caches, and these routes may break because of changes in link quality. Thus, it is desirable to use a distance measure that incorporates path length, links weights, and information about multiple routes between the nodes. One such measure is expected commute time Expected Commute Time Expected commute time is defined in terms of a random walk on the graph G. Every vertex in the graph is associated with a state in a discrete-time homogeneous Markov chain. Let s(t) be the state of the Markov Chain at time t. Then we let the transition probabilities between states be proportional to the weights of the edges emerging from the states. Thus, the single-step transition probability from state i to state j is given by P [s(t + 1) = j s(t) = i] = aij/di = pij. Since the graph is connected and the edges are not directed, the Markov chain is irreducible. 20

21 Consider the time to first reach some state k from state i, T ik. Formally, T ij = min {t 0 s(t) = j and s(0) = i}. The expected (or average) first-passage time from state i to state j is m(j i) = E[T ij ]. Details of the calculation of m(j i) are given in [25]. Note that m(j i) is not necessarily equal to m(i j), since they depend, respectively, on the probabilities of leaving state i and leaving state j, which are in general different. Thus, m(j i) is not a distance measure. However, consider the expected commute time, n(i, j) = m(j i) + m(i j), (2 1) which is the expected time for a random walker to first reach state j and then to first return to state i. Then n(i, j) is a valid distance measure [25]. The expected commute time n(i, j) has the useful property that it decreases when any of the paths between i and j are shortened or if additional paths are added between i and j. This can be shown true via an isomorphism with electrical resistive networks and application of Rayleigh s Monotonicity Law [25, 26]. These properties make the expected commute time a good candidate for a distance measure to use in selecting cache locations in a communications network because they encode not only the distance between the nodes and the caches but also the robustness of the cache to link failures because lower expected commute time between nodes is also associated with multiple paths connecting the nodes Using Spectral Embedding to Express Commute Time as Euclidean Distance Expected commute time has another property that makes it a good candidate as a distance measure. It can be computed using Euclidean distance by an appropriate embedding of the vertices of the graph into a high-dimensional Euclidean space. The details of this approach are given in [25] and summarized here for clarity. Let L denote the Moore-Penrose inverse of L. Note further that L is the discrete Green s function for L (with no boundary conditions) [27]. L can be written in terms of the Laplacian matrix 21

22 L, as L = (L eet n ) 1 + eet n, (2 2) where n is the number of vertices of G and e = [1, 1,..., 1] T. Let V G be the volume of the graph, V G = n d i. (2 3) i=1 Then the expected commute time between nodes i and j is n(i, j) = V G (e i e j ) T L (e i e j ), (2 4) where e i is the unit vector of length n with zeros in all positions except for the ith position, which is one. Instead of computing the commute time using L and (2 4), we instead propose to embed the vertices of G as points in a Euclidean space where the commute time can be computed using Euclidean distance. Since L is a real-symmetric matrix, is has a spectral factorization of the form L = UΛ p U T. Here Λ p is a diagonal matrix with the eigenvalues of L on the diagonal, and U is a matrix whose columns are the eigenvectors of L. Then (2 4) can be rewritten as n(i, j) = V G (x i x j ) T (x i x j ) = V G x i x j 2, (2 5) where x i = Λ p 1/2 U T e i. Thus, the coordinates of all of the embedded vertices is given by the columns of the matrix X given by X = Λ p 1/2 U T (2 6) As noted in [25], it is not necessary to compute L to compute the spectral embedding given by (2 6). Let {λ i } be the eigenvalues of L, and {λ i } be the eigenvalues 22

23 of L. Then L and L have the same eigenvectors, and λ i = 1/λ i (except for the eigenvalue 0, which is shared by both matrices). Thus, the projection in (2 6) can be carried out directly from the eigenvalues and eigenvectors of L Optimality Criterion We wish to choose a subset C V such that the expected commute time from the nodes V to the caches C is in minimized according to some cost criterion. As previously mentioned, we assume that each node is assigned to access one cache. Thus, the network is partitioned based on which caches the nodes are assigned to. Let C(V i ) be the cache to which vertex i is assigned, and let V(C j ) denote the set of vertices assigned to cache C j. Below, we assume that specifying {C(V i )} for all V i V implicitly specifies C. We call the optimization criterion for selection of which nodes will act as caches and for assignment of nodes to caches the minimum average commute time (MACT): MACT = arg min {C(V i } 1 V C C V V(C) n(v, C) (2 7) Note that the term 1/ V is a constant that can be omitted in the computations. The allocation of caches can be solved efficiently via clustering, as detailed in the next chapter. 23

24 CHAPTER 3 CLUSTERING ALGORITHM 3.1 Selection of Algorithm As mentioned in [28] clustering algorithms can be broadly divided into two classes: based on hierarchical methods and based on partitioning methods. Hierarchical algorithms again can be of two main types: agglomerative and divisive. In agglomerative algorithms, every object forms a separate cluster, and in consecutive steps clusters are merged, until the desired number of clusters is achieved. In contrast, divisive clustering starts by assigning all objects to a single cluster, and splitting one cluster in each subsequent step. The splitting stops after desired number of clusters have been achieved. In this work we choose to work with partitioning algorithms because of an inherent disadvantage of hierarchical methods their inability to undo a merging or splitting of two clusters, even if their regrouping results in a smaller average dissimilarity in the new cluster. This property typically results in inferior clustering performance. On the other hand, a partitioning algorithm tries to find out the best clustering by putting the most similar objects together in a cluster. There are various types of partitioning algorithms like K-means, K-medians, K-medoids, and fuzzy analysis. We chose to work with K-medoid algorithms because unlike K-means problems, K-medoids clustering problems choose a set of K objects from the given set of objects to be the representative of the clusters and associates each of the rest of the objects to one of the chosen K representatives. In addition K-medoid algorithms are known to handle large data sets more efficiently and needs no modification for translation or orthogonal transformation of data points. Partitioning Around Medoids (PAM) is one of the best known K-medoid algorithms. Although PAM has a very high computational complexity, we selected PAM for our purpose as it provides us with very high quality clustering results and needs little modification for 24

25 handling Euclidean criteria, and in this thesis our aim is to achieve the best possible clustering quality. Alternatively, the CLARANS algorithm can be used, with lower complexity. CLARANS is the acronym for A Clustering Algorithm based on Randomized Search). The general problem of clustering can be viewed as the problem of searching a graph where every node represents a solution i.e. a set of k medoids. Two nodes are called neighbors if their set differs by only one object. Therefore each node has n(n-k) number of neighbors, where k is the number of clusters. Thus each node can be assigned cost defined as the total dissimilarity between every object and medoids of its clusters. Thus PAM is the search for a minimum on this graph, and at each step all the neighbors of the current node is checked, and the current node replaced with the neighbor that has the minimum negative cost. Whereas PAM checks all the nodes, CLARANS draws a sample of neighbors dynamically. This is the key difference between PAM and CLARANS. CLARANS is more efficient and scalable than PAM is. 3.2 Partitioning Around Medoids (PAM) Algorithm PAM was developed by Kaufman and Rousseeuw and is documented in [29, Ch. 2]. The objective of PAM is to minimize the average dissimilarity between an object and its medoid. The algorithm starts in a BUILD phase in which medoids are selected, and then executes a SWAP phase in which alternate nodes are evaluated as medoids. 1. BUILD phase: In this phase PAM selects K objects randomly from the given set of N objects and calls them the medoid points. Next each of (N K) objects are assigned to one of the clusters represented by those k medoids on basis of the objects similarity to those medoid objects. If a point P i has minimum dissimilarity with a medoid point P m, compared to all other medoids, then P i is assigned to the cluster belonging to P m. Thus the initial clusters are formed in the BUILD stage. 2. SWAP phase: Here we first compute the overall reduction in average dissimilarity by replacing each medoid O m by each of the non-medoid objects O m in the cluster. The replacement that provides the maximum reduction in overall average dissimilarity is then implemented by actually making the replacement. In this process we also consider the transfer of a non-medoid object O i from one existing cluster to the cluster belonging to the second nearest medoid O m2 depending on 25

26 changes that are inflicted by replacing a medoid with a non-medoid point. There can be four such situations : (a) (b) (c) (d) O i is initially assigned to the cluster belonging to the medoid point O m. Now if O m is replaced by O m which is more dissimilar to O i as compared to the nearest medoid point O m2, then the point O i would move to the cluster represented by O m2. This implies this replacement increases average dissimilarity, i.e. the cost of such replacement is positive and can be given by Cost i = dissimilarity(o i, O m2 ) dissimilarity(o i, O m ). O i is a part of the cluster represented by O m and O m2 is more dissimilar to O i than the non-medoid O m, so O i stays in the same cluster which is now represented by O m. The cost associated might be negative or positive and is given by cost i = dissimilarity(o i, O m )dissimilarity(o i, O m ). O i is a part of a cluster represented by O m2 and not O m. Now O m is replaced by O m, while O i is more similar to its current medoid O m2 than to O m. So O i stays in the same cluster, and the cost associated is this cost i = 0. O i is a part of a cluster represented by O m 2 and not O m. This time O i is less similar to its current medoid O m2 than O m, so when O m is replaced by O m, O i moves from the cluster represented by O m2 to the cluster represented by O m. Cost associated is negative and is given as cost i = dissimilarity(o i, O m ) dissimilarity(o i, O m2 ). The total cost (CT) of replacing an existing medoid O m by a non-medoid O m is computed by summing the costs calculated above over all the non-medoids, i.e. CT (O m, O m ) = i cost i. The pair of (O m, O m ). that provides a negative minimum total cost is selected. 3.3 PAM for Cache Selection In this work we use PAM to select a subset of the communicators to serve as caches. We use PAM to partition the nodes into K clusters for which the medoids will be assigned the caches. PAM is applied to find the cache assignments for two different approaches. In the first, the dissimilarity between two vertices is measured by the Euclidean distance between the vertices of a graph in a R 2 subspace. In the second, the dissimilarity between two vertices is given by the expected commute time between those vertices. The first approach is the most traditional form of PAM. The second approach can also be directly implemented using the PAM algorithm using (2 5), which shows that 26

27 2.58 x Minimum Average Commute Time Mutiple Runs Figure 3-1. Minimum average commute time over multiple runs. expected commute time can be computed using Euclidean distance by using a spectral embedding of the vertices of the graph into a high-dimensional space. We note that a direct spectral embedding does require that the graph topology be fully connected, and we only consider this scenario in this work. 3.4 PAM Parameters and Performance Since PAM depends on a randomized search, results may vary each time the algorithm is run. Therefore, in order to find the best result, the clustering algorithm is run for several times for each topology, and the clustering corresponding to the minimum average cost is chosen. Figure 3-1 is a plot that shows how the average cost obtained for a commute-time based PAM in a 100-node topology with 5 clusters vary with multiple runs. 27

28 Figure 3-2. Average commute time between vertices and medoids as a function of number of iterations in the PAM clustering algorithm. The results in Figure 3-2 show the average commute time between the non-medoid nodes and the medoids (where the caches will be placed) as a function of the number of iterations in the PAM algorithm. As expected, the plot is monotonically decreasing, however the performance saturates after 6 iterations. 3.5 Motivating Example We use an example of a small, simple network to demonstrate the difference between the results obtained by the distance-based clustering and commute-time based clustering algorithms. A total of 18 nodes are partitioned into 2 clusters. Figure 3-3 shows the cluster assignment and cache assignment for the distance-based clustering algorithm, and Figure 3-4 shows the cluster assignment and cache assignment for the commute-time based clustering algorithm. Solid lines between vertices indicate that the vertices share a communication link. Red and blue node colors differentiate the two clusters, and the circled nodes are the medoids of the clusters, where the caches will be placed. Consider the results when PAM is applied to this topology with the distance-based metric, which is shown in Figure 3-3. The results match with intuition. The network is 28

29 Figure 3-3. Clusters formed by distance-based PAM clustering. Figure 3-4. Clusters formed by commute-time based PAM clustering. partitioned down the middle into two equal-sized clusters, with the node near the middle of each cluster (nodes 0 and 9) assigned as the medoids. When the same topology is clustered using commute-time based PAM clustering algorithm, we get different results. Although the two clusters are the same, the medoids of the clusters have changed to nodes 5 and nodes 10, as shown in Figure 3-4. The medoids chosen by the commute-time based PAM can be reached by every node except for nodes 0 and 9 by 29

30 two paths, thus resulting in a lower commute time for those nodes. If one of the links on the ring fails, then with the commute-time based medoid assignment, the nodes will be able to reroute the caching traffic around the failed link, whereas the distance-based medoid assignment has a critical dependence for all nodes on the links between vertices 0 5 and We note that expected commute time also has the advantage of providing a better medoid location based on network links even in the absence of link failures. To make a rough estimate of the network performance under the two types of clustering, we compute the average hop count between each communicator (vertex) and its corresponding cache (medoid). Since in both case the two clusters formed are symmetric, the average hop count is equal to the hop count of any one of the clusters. Let hc dist and hc com denote the hop counts under distance-based clustering and commute-time based clustering, respectively. Then, it is easy to see that hc dist = 1/9(hc(9, 10)+hc(9, 11)+hc(9, 17)+hc(9, 12) + hc(9, 16)+hc(9, 13) + hc(9, 15) +hc(9,14)) =1/9( ) = 2.67; hc com = 1/9(hc(0, 1)+hc(0, 2)+hc(0, 8)+hc(0, 3)+hc(0, 7)+hc(0, 4)+hc(0, 6)+hc(0, 5)) = 1/9( ) = 1.88; So we see a performance improvement of approximately 30 percent from the use of commute-time based clustering. In the following chapter, we use a network simulation to see if this improvement and the potential robustness to link failures translates into improvements in cache access latencies. 30

31 CHAPTER 4 NETWORK SIMULATION 4.1 Simulation In this chapter, we report on results of using network simulation to evaluate the performance of the proposed clustering algorithms in random connected networks with time-varying links. We evaluate the performance of the distance-based and commute-time based cache placement algorithms by computing the total time required to complete a series of cache requests, from which we compute the average cache access latency. The network simulation uses a slotted protocol. Each topology is simulated over many slots, and the access latencies are averaged over many randomly generated connected topologies. The simulation model is a slotted system, and the following activities take place in each of the slots: 1. All the non-medoid nodes in the network, generate a cache request with a certain cache request probability (p cache ). A node that has already generated a cache request but did not complete the data access yet is not allowed to generate another cache request. 2. Nodes that have generated a cache request, push the request packet to their respective send queue. Each node in the network is assigned an infinite queue, where the packets to transmitted are stored in FIFO basis. 3. To emulate the fact that the update frequency of routing tables is typically much smaller than the packet transmission time, the routing tables are updated by calculating the minimum-hop path between each pair of nodes during every r th transmission interval. We call 1/r the routing update frequency. 4. Each node with a non-empty send queue will try to send the first packet in their send queue to the next-hop for that packet with transmission probability p T. 5. If a node transmits in an interval, then it uses its routing table to find a path between itself and the destination node. 6. If the link between the current node and the next node in the path is up, the packet is sent to the next node, otherwise the packet stays in the send queue of the current node. 31

32 7. After a data packet reaches the intended node, the data access is assumed to be completed, and the current time stamp is stored as receive time for the particular node. 8. The difference between the transmit time and receive time gives the the data access time for the node. 9. At then end of each slot, the link state is updated according to the state transition probabilities of the channel. Table 4-1. Simulation parameters Parameter Value p cache 0.05 p trans 0.6 p bg 0.8,0.6,0.2 p gb 0.05 to 0.5 Number of nodes 100 Number of clusters 5 Routing frequency 0.01,0.02,0.05 Simulations were run for different values of the state transition probabilities. For each set of values, 50 randomly generated topologies were simulated. For each topology, the simulation was run for 10,000 slots. The data access times were averaged for a particular topology were averaged to produce the average latency for that topology, and the overall average latency was determined by averaging these over the 50 different topologies. The parameters of the simulation are collected in 4-1 Different topologies were generated by randomly varying the connectivity distance, fieldsize and coordinates of the vertices or nodes in the network on on the fly. Since we need a connected graph, after generating each random topology, a check is performed to make sure the produced topology is a connected graph. If not we try to convert it into a connected graph by varying the connectivity distance. Connectivity distance is the maximum distance by which two nodes that share an edge, can be apart by. The parameter fieldsize determines the maximum range of the x and y-coordinates in the topology. 32

33 4.2 Results As the p gb decreases, and therefore channels remain down for a longer period of time. We simulate our system i.e. compute the average access latency for a fixed value of p bg and plot it as a function of p gb. As explained by the following figures, for a fixed p bg, as the p gb goes up the access latency increases and commute time based clustering gives a lower access latency as compared to the clustering based on distance-based clustering. We also see that for lower value of p bg also access latency increases, although the commute time based clustering provides us with a better performance. We also vary the routing frequency as a parameter, and as suggested by the results, as the routing frequency increases the overall access latency decreases maintaining a superior performance by the commute time based clustering. Figure 4-1. Average access latency vs p gb for p bg = 0.8 and routing frequency = The results in Figure 4-1 show the average access latency as a function of the probability of transitioning from the good state to the bad state, p bg for the two different 33

34 clustering algorithms with p gb = 0.8, and routing frequency = The results show that the commute-time based cache placement algorithm provides significantly better performance than cache placement based on Euclidean distance. For example at p bg = 0.3, the access latency for commute-time based cache placement is 20, whereas the access latency for distance-based cache placement is 130. For the values considered in this graph, commute-time cache placement produces a reduction in average access latency of at least 85%. Figure 4-2. Average access latency vs p gb for p bg = 0.6 and routing frequency = The results in Figure 4-2 show the average access latency as a function of the probability of transitioning from the good state to the bad state, p bg for the two different clustering algorithms with p gb = 0.6, and routing frequency = The results show that the commute-time based cache placement algorithm provide significantly better performance than cache placement based on Euclidean distance. For example at p bg = 0.3, the access latency for commute-time based cache placement is 100, 34

35 whereas the access latency for distance-based cache placement is 550. For the values considered in this graph, commute-time cache placement produces a reduction in average access latency of at least 80%. If we compare this result with that of Figure 4-1, we would see that due to an increase in the sojourn time in the bad state, i.e. due to an increase in p gb, the performance of both the algorithms have degraded as compared to that in Figure 4-1, although the commute-time based algorithm in this case also performs better than the distance based algorithm. Figure 4-3. Average access latency vs p gb for p bg = 0.2 and routing frequency = The results in Figure 4-3 show the average access latency as a function of the probability of transitioning from the good state to the bad state, p bg for the two different clustering algorithms with p gb = 0.2, and routing frequency = The results show that the commute-time based cache placement algorithm provide significantly better performance than cache placement based on Euclidean distance. For example at p bg = 0.3, the access latency for commute-time based cache placement is 200, 35

36 whereas the access latency for distance-based cache placement is 800. For the values considered in this graph, commute-time cache placement produces a reduction in average access latency of at least 75%. If we compare this result with that of Figure 4-1 and Figure 4-2, we would see that due to an increase in the sojourn time in the bad state, i.e. due to an increase in p gb, the performance of both the algorithms have degraded as compared to that in Figure 4-1 and Figure 4-2, although the commute-time based algorithm in this case also performs better than the distance based algorithm. Figure 4-4. Average access latency vs p gb for p bg = 0.8 and routing frequency = The results in Figure 4-4 show the average access latency as a function of the probability of transitioning from the good state to the bad state, p bg for the two different clustering algorithms with p gb = 0.8, and routing frequency = The results show that the commute-time based cache placement algorithm provide significantly better performance than cache placement based on Euclidean distance. For example at p bg = 0.3, the access latency for commute-time based cache placement is 100, 36

Benefit-based Data Caching in Ad Hoc. Networks

Benefit-based Data Caching in Ad Hoc. Networks Benefit-based Data Caching in Ad Hoc Networks Bin Tang, Himanshu Gupta, Samir Das Computer Science Department Stony Brook University Stony Brook, NY 790 Email: {bintang,hgupta,samir}@cs.sunysb.edu Abstract

More information

3. Evaluation of Selected Tree and Mesh based Routing Protocols

3. Evaluation of Selected Tree and Mesh based Routing Protocols 33 3. Evaluation of Selected Tree and Mesh based Routing Protocols 3.1 Introduction Construction of best possible multicast trees and maintaining the group connections in sequence is challenging even in

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

Benefit-based Data Caching in Ad Hoc Networks

Benefit-based Data Caching in Ad Hoc Networks Benefit-based Data Caching in Ad Hoc Networks Bin Tang, Himanshu Gupta, Samir Das Computer Science Department Stony Brook University Stony Brook, NY 79 Email: {bintang,hgupta,samir}@cs.sunysb.edu Abstract

More information

MSA220 - Statistical Learning for Big Data

MSA220 - Statistical Learning for Big Data MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups

More information

Unicast Routing in Mobile Ad Hoc Networks. Dr. Ashikur Rahman CSE 6811: Wireless Ad hoc Networks

Unicast Routing in Mobile Ad Hoc Networks. Dr. Ashikur Rahman CSE 6811: Wireless Ad hoc Networks Unicast Routing in Mobile Ad Hoc Networks 1 Routing problem 2 Responsibility of a routing protocol Determining an optimal way to find optimal routes Determining a feasible path to a destination based on

More information

A LOAD-BASED APPROACH TO FORMING A CONNECTED DOMINATING SET FOR AN AD HOC NETWORK

A LOAD-BASED APPROACH TO FORMING A CONNECTED DOMINATING SET FOR AN AD HOC NETWORK Clemson University TigerPrints All Theses Theses 8-2014 A LOAD-BASED APPROACH TO FORMING A CONNECTED DOMINATING SET FOR AN AD HOC NETWORK Raihan Hazarika Clemson University, rhazari@g.clemson.edu Follow

More information

Benefit-based Data Caching in Ad Hoc Networks

Benefit-based Data Caching in Ad Hoc Networks Benefit-based Data Caching in Ad Hoc Networks Bin Tang, Member, IEEE, Himanshu Gupta, Member, IEEE, and Samir R. Das, Member, IEEE Abstract Data caching can significantly improve the efficiency of information

More information

End-To-End Delay Optimization in Wireless Sensor Network (WSN)

End-To-End Delay Optimization in Wireless Sensor Network (WSN) Shweta K. Kanhere 1, Mahesh Goudar 2, Vijay M. Wadhai 3 1,2 Dept. of Electronics Engineering Maharashtra Academy of Engineering, Alandi (D), Pune, India 3 MITCOE Pune, India E-mail: shweta.kanhere@gmail.com,

More information

Routing Protocols in MANETs

Routing Protocols in MANETs Chapter 4 Routing Protocols in MANETs 4.1 Introduction The main aim of any Ad Hoc network routing protocol is to meet the challenges of the dynamically changing topology and establish a correct and an

More information

Computation of Multiple Node Disjoint Paths

Computation of Multiple Node Disjoint Paths Chapter 5 Computation of Multiple Node Disjoint Paths 5.1 Introduction In recent years, on demand routing protocols have attained more attention in mobile Ad Hoc networks as compared to other routing schemes

More information

SUMMERY, CONCLUSIONS AND FUTURE WORK

SUMMERY, CONCLUSIONS AND FUTURE WORK Chapter - 6 SUMMERY, CONCLUSIONS AND FUTURE WORK The entire Research Work on On-Demand Routing in Multi-Hop Wireless Mobile Ad hoc Networks has been presented in simplified and easy-to-read form in six

More information

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2

More information

A Comparative Analysis between Forwarding and Network Coding Techniques for Multihop Wireless Networks

A Comparative Analysis between Forwarding and Network Coding Techniques for Multihop Wireless Networks A Comparative Analysis between Forwarding and Network Coding Techniques for Multihop Wireless Networks Suranjit Paul spaul2@connect.carleton.ca Broadband Network Lab, Carleton University Acknowledgements

More information

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16 CLUSTERING CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16 1. K-medoids: REFERENCES https://www.coursera.org/learn/cluster-analysis/lecture/nj0sb/3-4-the-k-medoids-clustering-method https://anuradhasrinivas.files.wordpress.com/2013/04/lesson8-clustering.pdf

More information

Workload Characterization Techniques

Workload Characterization Techniques Workload Characterization Techniques Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/

More information

Routing protocols in WSN

Routing protocols in WSN Routing protocols in WSN 1.1 WSN Routing Scheme Data collected by sensor nodes in a WSN is typically propagated toward a base station (gateway) that links the WSN with other networks where the data can

More information

Ad hoc and Sensor Networks Topology control

Ad hoc and Sensor Networks Topology control Ad hoc and Sensor Networks Topology control Goals of this chapter Networks can be too dense too many nodes in close (radio) vicinity This chapter looks at methods to deal with such networks by Reducing/controlling

More information

Intra and Inter Cluster Synchronization Scheme for Cluster Based Sensor Network

Intra and Inter Cluster Synchronization Scheme for Cluster Based Sensor Network Intra and Inter Cluster Synchronization Scheme for Cluster Based Sensor Network V. Shunmuga Sundari 1, N. Mymoon Zuviria 2 1 Student, 2 Asisstant Professor, Computer Science and Engineering, National College

More information

CONSTRUCTION AND EVALUATION OF MESHES BASED ON SHORTEST PATH TREE VS. STEINER TREE FOR MULTICAST ROUTING IN MOBILE AD HOC NETWORKS

CONSTRUCTION AND EVALUATION OF MESHES BASED ON SHORTEST PATH TREE VS. STEINER TREE FOR MULTICAST ROUTING IN MOBILE AD HOC NETWORKS CONSTRUCTION AND EVALUATION OF MESHES BASED ON SHORTEST PATH TREE VS. STEINER TREE FOR MULTICAST ROUTING IN MOBILE AD HOC NETWORKS 1 JAMES SIMS, 2 NATARAJAN MEGHANATHAN 1 Undergrad Student, Department

More information

Course Routing Classification Properties Routing Protocols 1/39

Course Routing Classification Properties Routing Protocols 1/39 Course 8 3. Routing Classification Properties Routing Protocols 1/39 Routing Algorithms Types Static versus dynamic Single-path versus multipath Flat versus hierarchical Host-intelligent versus router-intelligent

More information

Visual Representations for Machine Learning

Visual Representations for Machine Learning Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering

More information

AN EVOLUTIONARY APPROACH TO DISTANCE VECTOR ROUTING

AN EVOLUTIONARY APPROACH TO DISTANCE VECTOR ROUTING International Journal of Latest Research in Science and Technology Volume 3, Issue 3: Page No. 201-205, May-June 2014 http://www.mnkjournals.com/ijlrst.htm ISSN (Online):2278-5299 AN EVOLUTIONARY APPROACH

More information

Lecture (08, 09) Routing in Switched Networks

Lecture (08, 09) Routing in Switched Networks Agenda Lecture (08, 09) Routing in Switched Networks Dr. Ahmed ElShafee Routing protocols Fixed Flooding Random Adaptive ARPANET Routing Strategies ١ Dr. Ahmed ElShafee, ACU Fall 2011, Networks I ٢ Dr.

More information

Efficient Hybrid Multicast Routing Protocol for Ad-Hoc Wireless Networks

Efficient Hybrid Multicast Routing Protocol for Ad-Hoc Wireless Networks Efficient Hybrid Multicast Routing Protocol for Ad-Hoc Wireless Networks Jayanta Biswas and Mukti Barai and S. K. Nandy CAD Lab, Indian Institute of Science Bangalore, 56, India {jayanta@cadl, mbarai@cadl,

More information

Implementation of Near Optimal Algorithm for Integrated Cellular and Ad-Hoc Multicast (ICAM)

Implementation of Near Optimal Algorithm for Integrated Cellular and Ad-Hoc Multicast (ICAM) CS230: DISTRIBUTED SYSTEMS Project Report on Implementation of Near Optimal Algorithm for Integrated Cellular and Ad-Hoc Multicast (ICAM) Prof. Nalini Venkatasubramanian Project Champion: Ngoc Do Vimal

More information

Chapter 7 CONCLUSION

Chapter 7 CONCLUSION 97 Chapter 7 CONCLUSION 7.1. Introduction A Mobile Ad-hoc Network (MANET) could be considered as network of mobile nodes which communicate with each other without any fixed infrastructure. The nodes in

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 23 November 2018 Overview Unsupervised Machine Learning overview Association

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

CHAPTER 5 ANT-FUZZY META HEURISTIC GENETIC SENSOR NETWORK SYSTEM FOR MULTI - SINK AGGREGATED DATA TRANSMISSION

CHAPTER 5 ANT-FUZZY META HEURISTIC GENETIC SENSOR NETWORK SYSTEM FOR MULTI - SINK AGGREGATED DATA TRANSMISSION CHAPTER 5 ANT-FUZZY META HEURISTIC GENETIC SENSOR NETWORK SYSTEM FOR MULTI - SINK AGGREGATED DATA TRANSMISSION 5.1 INTRODUCTION Generally, deployment of Wireless Sensor Network (WSN) is based on a many

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

IMPROVING THE DATA COLLECTION RATE IN WIRELESS SENSOR NETWORKS BY USING THE MOBILE RELAYS

IMPROVING THE DATA COLLECTION RATE IN WIRELESS SENSOR NETWORKS BY USING THE MOBILE RELAYS IMPROVING THE DATA COLLECTION RATE IN WIRELESS SENSOR NETWORKS BY USING THE MOBILE RELAYS 1 K MADHURI, 2 J.KRISHNA, 3 C.SIVABALAJI II M.Tech CSE, AITS, Asst Professor CSE, AITS, Asst Professor CSE, NIST

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be

More information

Clustering in Data Mining

Clustering in Data Mining Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,

More information

TELCOM2125: Network Science and Analysis

TELCOM2125: Network Science and Analysis School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning

More information

Introduction to Mobile Ad hoc Networks (MANETs)

Introduction to Mobile Ad hoc Networks (MANETs) Introduction to Mobile Ad hoc Networks (MANETs) 1 Overview of Ad hoc Network Communication between various devices makes it possible to provide unique and innovative services. Although this inter-device

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic

More information

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents E-Companion: On Styles in Product Design: An Analysis of US Design Patents 1 PART A: FORMALIZING THE DEFINITION OF STYLES A.1 Styles as categories of designs of similar form Our task involves categorizing

More information

Distributed Routing. EECS 228 Abhay Parekh

Distributed Routing. EECS 228 Abhay Parekh Distributed Routing EECS 228 Abhay Parekh parekh@eecs.berkeley.edu he Network is a Distributed System Nodes are local processors Messages are exchanged over various kinds of links Nodes contain sensors

More information

NETWORK coding is an area that has emerged in 2000 [1],

NETWORK coding is an area that has emerged in 2000 [1], 450 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 16, NO. 2, APRIL 2008 Efficient Broadcasting Using Network Coding Christina Fragouli, Jörg Widmer, and Jean-Yves Le Boudec, Fellow, IEEE Abstract We consider

More information

Simulation & Performance Analysis of Mobile Ad-Hoc Network Routing Protocol

Simulation & Performance Analysis of Mobile Ad-Hoc Network Routing Protocol Simulation & Performance Analysis of Mobile Ad-Hoc Network Routing Protocol V.S.Chaudhari 1, Prof.P.N.Matte 2, Prof. V.P.Bhope 3 Department of E&TC, Raisoni College of Engineering, Ahmednagar Abstract:-

More information

Communication Networks I December 4, 2001 Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page 1

Communication Networks I December 4, 2001 Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page 1 Communication Networks I December, Agenda Graph theory notation Trees Shortest path algorithms Distributed, asynchronous algorithms Page Communication Networks I December, Notation G = (V,E) denotes a

More information

CHAPTER 5 PROPAGATION DELAY

CHAPTER 5 PROPAGATION DELAY 98 CHAPTER 5 PROPAGATION DELAY Underwater wireless sensor networks deployed of sensor nodes with sensing, forwarding and processing abilities that operate in underwater. In this environment brought challenges,

More information

Clustering part II 1

Clustering part II 1 Clustering part II 1 Clustering What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods 2 Partitioning Algorithms:

More information

Part I. Wireless Communication

Part I. Wireless Communication 1 Part I. Wireless Communication 1.5 Topologies of cellular and ad-hoc networks 2 Introduction Cellular telephony has forever changed the way people communicate with one another. Cellular networks enable

More information

Performance Evaluation of Mesh - Based Multicast Routing Protocols in MANET s

Performance Evaluation of Mesh - Based Multicast Routing Protocols in MANET s Performance Evaluation of Mesh - Based Multicast Routing Protocols in MANET s M. Nagaratna Assistant Professor Dept. of CSE JNTUH, Hyderabad, India V. Kamakshi Prasad Prof & Additional Cont. of. Examinations

More information

Throughput-Optimal Broadcast in Wireless Networks with Point-to-Multipoint Transmissions

Throughput-Optimal Broadcast in Wireless Networks with Point-to-Multipoint Transmissions Throughput-Optimal Broadcast in Wireless Networks with Point-to-Multipoint Transmissions Abhishek Sinha Laboratory for Information and Decision Systems MIT MobiHoc, 2017 April 18, 2017 1 / 63 Introduction

More information

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:

More information

Assignment 5. Georgia Koloniari

Assignment 5. Georgia Koloniari Assignment 5 Georgia Koloniari 2. "Peer-to-Peer Computing" 1. What is the definition of a p2p system given by the authors in sec 1? Compare it with at least one of the definitions surveyed in the last

More information

An Enhanced Algorithm to Find Dominating Set Nodes in Ad Hoc Wireless Networks

An Enhanced Algorithm to Find Dominating Set Nodes in Ad Hoc Wireless Networks Georgia State University ScholarWorks @ Georgia State University Computer Science Theses Department of Computer Science 12-4-2006 An Enhanced Algorithm to Find Dominating Set Nodes in Ad Hoc Wireless Networks

More information

Topology Enhancement in Wireless Multihop Networks: A Top-down Approach

Topology Enhancement in Wireless Multihop Networks: A Top-down Approach Topology Enhancement in Wireless Multihop Networks: A Top-down Approach Symeon Papavassiliou (joint work with Eleni Stai and Vasileios Karyotis) National Technical University of Athens (NTUA) School of

More information

Pattern Recognition Lecture Sequential Clustering

Pattern Recognition Lecture Sequential Clustering Pattern Recognition Lecture Prof. Dr. Marcin Grzegorzek Research Group for Pattern Recognition Institute for Vision and Graphics University of Siegen, Germany Pattern Recognition Chain patterns sensor

More information

Connection-Level Scheduling in Wireless Networks Using Only MAC-Layer Information

Connection-Level Scheduling in Wireless Networks Using Only MAC-Layer Information Connection-Level Scheduling in Wireless Networks Using Only MAC-Layer Information Javad Ghaderi, Tianxiong Ji and R. Srikant Coordinated Science Laboratory and Department of Electrical and Computer Engineering

More information

Chapter 5 (Week 9) The Network Layer ANDREW S. TANENBAUM COMPUTER NETWORKS FOURTH EDITION PP BLM431 Computer Networks Dr.

Chapter 5 (Week 9) The Network Layer ANDREW S. TANENBAUM COMPUTER NETWORKS FOURTH EDITION PP BLM431 Computer Networks Dr. Chapter 5 (Week 9) The Network Layer ANDREW S. TANENBAUM COMPUTER NETWORKS FOURTH EDITION PP. 343-396 1 5.1. NETWORK LAYER DESIGN ISSUES 5.2. ROUTING ALGORITHMS 5.3. CONGESTION CONTROL ALGORITHMS 5.4.

More information

PAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods

PAM algorithm. Types of Data in Cluster Analysis. A Categorization of Major Clustering Methods. Partitioning i Methods. Hierarchical Methods Whatis Cluster Analysis? Clustering Types of Data in Cluster Analysis Clustering part II A Categorization of Major Clustering Methods Partitioning i Methods Hierarchical Methods Partitioning i i Algorithms:

More information

Improving the Data Scheduling Efficiency of the IEEE (d) Mesh Network

Improving the Data Scheduling Efficiency of the IEEE (d) Mesh Network Improving the Data Scheduling Efficiency of the IEEE 802.16(d) Mesh Network Shie-Yuan Wang Email: shieyuan@csie.nctu.edu.tw Chih-Che Lin Email: jclin@csie.nctu.edu.tw Ku-Han Fang Email: khfang@csie.nctu.edu.tw

More information

Delay Tolerant Networks

Delay Tolerant Networks Delay Tolerant Networks DEPARTMENT OF INFORMATICS & TELECOMMUNICATIONS NATIONAL AND KAPODISTRIAN UNIVERSITY OF ATHENS What is different? S A wireless network that is very sparse and partitioned disconnected

More information

Efficient Cluster Based Data Collection Using Mobile Data Collector for Wireless Sensor Network

Efficient Cluster Based Data Collection Using Mobile Data Collector for Wireless Sensor Network ISSN (e): 2250 3005 Volume, 06 Issue, 06 June 2016 International Journal of Computational Engineering Research (IJCER) Efficient Cluster Based Data Collection Using Mobile Data Collector for Wireless Sensor

More information

A Cross-Layer Design for Reducing Packet Loss Caused by Fading in a Mobile Ad Hoc Network

A Cross-Layer Design for Reducing Packet Loss Caused by Fading in a Mobile Ad Hoc Network Clemson University TigerPrints All Theses Theses 8-2017 A Cross-Layer Design for Reducing Packet Loss Caused by Fading in a Mobile Ad Hoc Network William Derek Johnson Clemson University Follow this and

More information

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups

More information

Kapitel 5: Mobile Ad Hoc Networks. Characteristics. Applications of Ad Hoc Networks. Wireless Communication. Wireless communication networks types

Kapitel 5: Mobile Ad Hoc Networks. Characteristics. Applications of Ad Hoc Networks. Wireless Communication. Wireless communication networks types Kapitel 5: Mobile Ad Hoc Networks Mobilkommunikation 2 WS 08/09 Wireless Communication Wireless communication networks types Infrastructure-based networks Infrastructureless networks Ad hoc networks Prof.

More information

Energy Efficient Data Gathering For Throughput Maximization with Multicast Protocol In Wireless Sensor Networks

Energy Efficient Data Gathering For Throughput Maximization with Multicast Protocol In Wireless Sensor Networks Energy Efficient Data Gathering For Throughput Maximization with Multicast Protocol In Wireless Sensor Networks S. Gokilarani 1, P. B. Pankajavalli 2 1 Research Scholar, Kongu Arts and Science College,

More information

A New Approach for Energy Efficient Routing in MANETs Using Multi Objective Genetic Algorithm

A New Approach for Energy Efficient Routing in MANETs Using Multi Objective Genetic Algorithm A New Approach for Energy Efficient in MANETs Using Multi Objective Genetic Algorithm Neha Agarwal, Neeraj Manglani Abstract Mobile ad hoc networks (MANET) are selfcreating networks They contain short

More information

Wzzard Sensing Platform Network Planning and Installation. Application Note

Wzzard Sensing Platform Network Planning and Installation. Application Note Wzzard Sensing Platform Network Planning and Installation Application Note 1 International Headquarters B&B Electronics Mfg. Co. Inc. 707 Dayton Road Ottawa, IL 61350 USA Phone (815) 433-5100 -- General

More information

Kapitel 4: Clustering

Kapitel 4: Clustering Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.

More information

Module 6 NP-Complete Problems and Heuristics

Module 6 NP-Complete Problems and Heuristics Module 6 NP-Complete Problems and Heuristics Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS 97 E-mail: natarajan.meghanathan@jsums.edu Optimization vs. Decision

More information

Simulations of the quadrilateral-based localization

Simulations of the quadrilateral-based localization Simulations of the quadrilateral-based localization Cluster success rate v.s. node degree. Each plot represents a simulation run. 9/15/05 Jie Gao CSE590-fall05 1 Random deployment Poisson distribution

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 4th, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig The Cluster

More information

Zone-based Proactive Source Routing Protocol for Ad-hoc Networks

Zone-based Proactive Source Routing Protocol for Ad-hoc Networks 2014 IJSRSET Volume i Issue i Print ISSN : 2395-1990 Online ISSN : 2394-4099 Themed Section: Science Zone-based Proactive Source Routing Protocol for Ad-hoc Networks Dr.Sangheethaa.S 1, Dr. Arun Korath

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

Link Scheduling in Multi-Transmit-Receive Wireless Networks

Link Scheduling in Multi-Transmit-Receive Wireless Networks Macau University of Science and Technology From the SelectedWorks of Hong-Ning Dai 2011 Link Scheduling in Multi-Transmit-Receive Wireless Networks Hong-Ning Dai, Macau University of Science and Technology

More information

MAC LAYER. Murat Demirbas SUNY Buffalo

MAC LAYER. Murat Demirbas SUNY Buffalo MAC LAYER Murat Demirbas SUNY Buffalo MAC categories Fixed assignment TDMA (Time Division), CDMA (Code division), FDMA (Frequency division) Unsuitable for dynamic, bursty traffic in wireless networks Random

More information

6 Randomized rounding of semidefinite programs

6 Randomized rounding of semidefinite programs 6 Randomized rounding of semidefinite programs We now turn to a new tool which gives substantially improved performance guarantees for some problems We now show how nonlinear programming relaxations can

More information

Effects of Sensor Nodes Mobility on Routing Energy Consumption Level and Performance of Wireless Sensor Networks

Effects of Sensor Nodes Mobility on Routing Energy Consumption Level and Performance of Wireless Sensor Networks Effects of Sensor Nodes Mobility on Routing Energy Consumption Level and Performance of Wireless Sensor Networks Mina Malekzadeh Golestan University Zohre Fereidooni Golestan University M.H. Shahrokh Abadi

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

Some Optimization Trade-offs in Wireless Network Coding

Some Optimization Trade-offs in Wireless Network Coding Some Optimization Trade-offs in Wireless Network Coding Yalin Evren Sagduyu and Anthony Ephremides Electrical and Computer Engineering Department and Institute for Systems Research University of Maryland,

More information

Data Caching under Number Constraint

Data Caching under Number Constraint 1 Data Caching under Number Constraint Himanshu Gupta and Bin Tang Abstract Caching can significantly improve the efficiency of information access in networks by reducing the access latency and bandwidth

More information

Ad hoc and Sensor Networks Chapter 10: Topology control

Ad hoc and Sensor Networks Chapter 10: Topology control Ad hoc and Sensor Networks Chapter 10: Topology control Holger Karl Computer Networks Group Universität Paderborn Goals of this chapter Networks can be too dense too many nodes in close (radio) vicinity

More information

University of Florida CISE department Gator Engineering. Clustering Part 2

University of Florida CISE department Gator Engineering. Clustering Part 2 Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical

More information

Fairness Example: high priority for nearby stations Optimality Efficiency overhead

Fairness Example: high priority for nearby stations Optimality Efficiency overhead Routing Requirements: Correctness Simplicity Robustness Under localized failures and overloads Stability React too slow or too fast Fairness Example: high priority for nearby stations Optimality Efficiency

More information

13 Sensor networks Gathering in an adversarial environment

13 Sensor networks Gathering in an adversarial environment 13 Sensor networks Wireless sensor systems have a broad range of civil and military applications such as controlling inventory in a warehouse or office complex, monitoring and disseminating traffic conditions,

More information

Module 6 NP-Complete Problems and Heuristics

Module 6 NP-Complete Problems and Heuristics Module 6 NP-Complete Problems and Heuristics Dr. Natarajan Meghanathan Professor of Computer Science Jackson State University Jackson, MS 397 E-mail: natarajan.meghanathan@jsums.edu Optimization vs. Decision

More information

6367(Print), ISSN (Online) Volume 4, Issue 2, March April (2013), IAEME & TECHNOLOGY (IJCET)

6367(Print), ISSN (Online) Volume 4, Issue 2, March April (2013), IAEME & TECHNOLOGY (IJCET) INTERNATIONAL International Journal of Computer JOURNAL Engineering OF COMPUTER and Technology ENGINEERING (IJCET), ISSN 0976- & TECHNOLOGY (IJCET) ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume 4,

More information

Social-Network Graphs

Social-Network Graphs Social-Network Graphs Mining Social Networks Facebook, Google+, Twitter Email Networks, Collaboration Networks Identify communities Similar to clustering Communities usually overlap Identify similarities

More information

ECE 333: Introduction to Communication Networks Fall 2001

ECE 333: Introduction to Communication Networks Fall 2001 ECE : Introduction to Communication Networks Fall 00 Lecture : Routing and Addressing I Introduction to Routing/Addressing Lectures 9- described the main components of point-to-point networks, i.e. multiplexed

More information

Scalable overlay Networks

Scalable overlay Networks overlay Networks Dr. Samu Varjonen 1 Lectures MO 15.01. C122 Introduction. Exercises. Motivation. TH 18.01. DK117 Unstructured networks I MO 22.01. C122 Unstructured networks II TH 25.01. DK117 Bittorrent

More information

Community Detection. Community

Community Detection. Community Community Detection Community In social sciences: Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group a.k.a. group,

More information

WSN Routing Protocols

WSN Routing Protocols WSN Routing Protocols 1 Routing Challenges and Design Issues in WSNs 2 Overview The design of routing protocols in WSNs is influenced by many challenging factors. These factors must be overcome before

More information

A Heuristic Algorithm for Designing Logical Topologies in Packet Networks with Wavelength Routing

A Heuristic Algorithm for Designing Logical Topologies in Packet Networks with Wavelength Routing A Heuristic Algorithm for Designing Logical Topologies in Packet Networks with Wavelength Routing Mare Lole and Branko Mikac Department of Telecommunications Faculty of Electrical Engineering and Computing,

More information

A Survey - Energy Efficient Routing Protocols in MANET

A Survey - Energy Efficient Routing Protocols in MANET , pp. 163-168 http://dx.doi.org/10.14257/ijfgcn.2016.9.5.16 A Survey - Energy Efficient Routing Protocols in MANET Jyoti Upadhyaya and Nitin Manjhi Department of Computer Science, RGPV University Shriram

More information

Maximization of Time-to-first-failure for Multicasting in Wireless Networks: Optimal Solution

Maximization of Time-to-first-failure for Multicasting in Wireless Networks: Optimal Solution Arindam K. Das, Mohamed El-Sharkawi, Robert J. Marks, Payman Arabshahi and Andrew Gray, "Maximization of Time-to-First-Failure for Multicasting in Wireless Networks : Optimal Solution", Military Communications

More information

Spectral Clustering and Community Detection in Labeled Graphs

Spectral Clustering and Community Detection in Labeled Graphs Spectral Clustering and Community Detection in Labeled Graphs Brandon Fain, Stavros Sintos, Nisarg Raval Machine Learning (CompSci 571D / STA 561D) December 7, 2015 {btfain, nisarg, ssintos} at cs.duke.edu

More information

Collaborative filtering based on a random walk model on a graph

Collaborative filtering based on a random walk model on a graph Collaborative filtering based on a random walk model on a graph Marco Saerens, Francois Fouss, Alain Pirotte, Luh Yen, Pierre Dupont (UCL) Jean-Michel Renders (Xerox Research Europe) Some recent methods:

More information

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.

More information

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,

More information

Lesson 2 7 Graph Partitioning

Lesson 2 7 Graph Partitioning Lesson 2 7 Graph Partitioning The Graph Partitioning Problem Look at the problem from a different angle: Let s multiply a sparse matrix A by a vector X. Recall the duality between matrices and graphs:

More information

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic

Clustering. SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic Clustering is one of the fundamental and ubiquitous tasks in exploratory data analysis a first intuition about the

More information

Graph Theoretic Models for Ad hoc Wireless Networks

Graph Theoretic Models for Ad hoc Wireless Networks Graph Theoretic Models for Ad hoc Wireless Networks Prof. Srikrishnan Divakaran DA-IICT 10/4/2009 DA-IICT 1 Talk Outline Overview of Ad hoc Networks Design Issues in Modeling Ad hoc Networks Graph Theoretic

More information