Report on the paper Summarization-based Mining Bipartite Graphs

Size: px
Start display at page:

Download "Report on the paper Summarization-based Mining Bipartite Graphs"

Transcription

1 Report on the paper Summarization-based Mining Bipartite Graphs Annika Glauser ETH Zuerich Spring 2015 Extract from the paper [1]:

2 Introduction The paper Summarization-based Mining Bipartite Graphs introduces a new algorithm SC- Miner (Summarization-Compression Miner) including graph summarization, graph clustering, link prediction and discovery of the hidden structure. The objective of graph summarization is to produce a compressed representation of the input graph. The aim of graph clustering is to group similar nodes of the graph together in clusters. Link prediction wants to predict missing or eventual future links of the graph, and with trying to discover the hidden structure of a graph, we want to say something about the structure of the data. So the algorithm converts a large bipartite graph in a highly compact smaller graph, which should give an idea of the structure of the data, abstract the original graph, cluster its nodes and predict missing or future links. This makes the algorithm into a useful tool for Data-Mining. For an illustrative example look at the graphs below. The left graph is an input graph that corresponds to (a very small part of) data from an online movie rating site. The users are denoted by squares while the movies are represented by circles. If a user liked a movie, he or she is connected to it by an edge. In the graph on the right, the nodes that could be in the same cluster are circled together. And as the other users that liked the movie Pitch Perfect liked The Devil Wears Prada, it might be predicted that Anna likes it too (denoted by the bold edge). Figure 0: A bipartite input graph Figure 1: Graph with possible clusters and predicted edges The hidden structure of the data then might look like the graph in figure 2. The problem is, the clusters and predicted edges in figure 1 and the structure in figure 2 are just possible solutions for the problem. Because a bipartite graph can be represented by a lot of different - not necessarily good - summarizations. As it can be proven that finding the global optimal summarization is NP-hard, the algorithm SCMiner follows a heuristic approach to search for the local optima. Figure 2: A possible hidden structure 1

3 Model To be more formal than in the previous section, the input of the algorithm is a large bipartite graph G = (V 1, V 2, E) where V 1 and V 2 are node sets of type 1 or type 2 respectively, and E the set of edges between them. The first part of the output is a summary Graph G S = (S 1, S 2, E ) that contains two sets S 1 and S 2 with clusters of nodes of their corresponding type - called super nodes - and a set of Edges E between the super nodes. The second part of the output is an additional Graph G A = (V 1, V 2, E ) that contains the original node sets V 1 and V 2 and a set E of added or deleted edges between them, that would be needed to restore the original graph G. The edges in E with a (+) sign have to be added to G S in order to obtain G from it and vice versa with the edges marked with a (-) sign. To go with the previous example: The original Graph is denoted by G = (V 1, V 2, E) where V 1 = {A, B, C, D, E} (the users) and V 2 = {P, T d, S, T } (the movies). The summary graph G S = (S 1, S 2, E ) consists of the four super nodes S 11 = {A, B, C}, S 12 = {D, E}, S 21 = {P, T d} and S 22 = {S, T }. The additional Graph G A consists of the deleted edge (C, S) (marked with a (+) sign) and the added edge (A, T d) (marked with a (-) sign). Figure 3: An example for the model In fact, the minus and plus signs in G A can be omitted, as they can be derived by comparing G A with G S : if an edge of G A is in G S, the edge was added by the algorithm and is not part of the original graph G. Vice versa for an edge in G A that doesn t appear in G S this means the original edge was deleted. 2

4 Data Compression As already mentioned, there are a lot of different summarizations for one bipartite graph. Naturally for our example graph we could just look at the different summarizations and chose the best one. As the input normally is a lot bigger than the graph in figure 0, the algorithm tries to improve the summarization step wise. But how to measure the goodness of a summarization? The Minimum Description Length (MDL) principle states, that the more we can compress the data (graph), the more we learn of its underlying structure. Therefore the goodness of a summarization is measurable in the shortness of its description length. Inspired by this principle, the authors propose the following coding scheme: they measure the coding cost CC(H) of a graph H = (V 1, V 2, E) by the lower bound of the coding cost of its compressed adjacent matrix A V 1 V 2 with a ij = 1 if (V 1i, V 1j ) E. Which is: 1 CC(H) = V 1 V 2 H(A) (1) where H(A) = p 0 (A) log 2 (1/p 0 (A)) + p 1 (A) log 2 (1/p 1 (A)) - the entropy of A - and p 0 and p 1 the probabilities of finding 1 and 0 entries in the adjacency matrix A of H. The additional graph G A from the previously introduced model can be represented by a simple adjacency matrix A GA {0, 1} V 1 V 2, for G S however we need in addition to the adjacency matrix A GS {0, 1} S 1 S 2 a list of the nodes and their corresponding super nodes. The coding cost of this list is: CC(list) = 2 N i V i S ij log 2 S ij i=1 j=1 Where N i is the number of super nodes of type i, S ij the number of nodes in super node S ij and V i the number of nodes of type i. With this, the coding cost of a summarization G in the previously introduced model is: 2 CC(G) = CC(G S ) + CC(G A ) + CC(list) (3) The goal of the algorithm is to find a summarization that minimizes (3), because corresponding to the MDL principle, the solution should be optimal when the MDL is minimal. About (2): The information content of a certain Event E can be measured by the function I(E) = I(p(E)) = log 2 (1/p(E)) where p(e) is the probability of E. The unit of measure is bits. So in fact, I(E) tells us how many bits we need to encode the event E. In (2) this event is: v V i : v S ij. The probability of this event (when picking v at random) is S ij / V i and therefore the information content is: v V i : I(v S ij ) = log 2 ( V i / S ij ). To encode the list, we omit the names of the nodes and just concatenate the codes that correspond to the super nodes to which the nodes belong. (The order of the nodes is given by the order of the nodes in the adjacency matrix of G A ). As there are S ij nodes in S ij there are S ij strings of log 2 ( V i / S ij ) bits in the coding of the list. By summing this over all super nodes S ij in the summarization we get the number of required bits to encode the whole list. About (1): The entropy H(A) is the average information content of an entry of the adjacency matrix A. And therefore (1) is the average information content of the whole adjacency matrix. (2) 1 The corresponding formula in the paper has a minus sign, which is a typo - as I verified with the authors. 2 The paper is rather inexact in this equation. In the model, the list is integrated in G S, therefore the equation should be CC(G) = CC(G S ) + CC(G A ) with CC(G S ) = CC(A GS ) + CC(list). 3

5 This was all very theoretic, so let s look at our previous example graph from figure 0 denoted by G. To make it comparable to other summarizations, we need to represent G by G S, G A and a list. As we haven t changed anything yet, A GS {0, 1} S 1 S 2 (the adjacency matrix of G S ) is in {0, 1} V 1 V 2 and A GA {0, 1} V 1 V 2 is the zero matrix. No nodes were grouped yet, so each node is in its own super node: A GS = B C A G A = E A B C list : A : A B : B C : C D : D E : E P : P T d : T d S : S T : T For the coding costs we have: CC(G S ) = S 1 S 2 H(A GS ) = 5 4 ( 1 2 log 2(2) log 2(2)) = = 20 CC(G A ) = V 1 V 2 H(A GA ) = 5 4 (0log 2 (0) + 1log 2 (1)) = = 0 CC(list) = 2 Ni i=1 j=1 S V i ij log 2 = 5 1 log S ij log = Therefore CC(G) = CC(G S ) + CC(G A ) + CC(list) = = 39.6 Now let s say the output of the algorithm are the graphs on the right side of figure 3. Then we have: A : S 11 P : S 21 ( S 21 S 22 ) B S G S = B : S 11 T d : S 21 G S A = C C : S 11 S : S 22 D : S 12 T : S 22 E : S 12 The the coding cost changes to: CC(G S ) = S 1 S 2 H(A GS ) = 2 2 ( 1 2 log 2(2) log 2(2)) = = 4 CC(G A ) = V 1 V 2 H(A GA ) = 5 4 ( 1 log 10 2(10) + 9 log 10 2( 10 )) = CC(list) = 2 Ni i=1 j=1 S V i ij log 2 = 3 log S ij log log = And CC(G) = CC(G S ) + CC(G A ) + CC(list) = = This tells us that the graph in figure 3 is a better summarization of the input graph G than G itself. Which should have already been clear. 4

6 Edge Modification Imagine we have a group of nodes that we d like to merge. For this, all nodes in the group need to have exactly the same link pattern. If this is not the case, we need to change their patterns to match each other. Let s look at the graph in figure 0 again and assume we want to merge A, B and C: They have the common neighbour P, what means that we don t have to change any link patterns to this node. But T d is only connected to B and C, and S has only a connection to C. Now for each not common neighbour in the group there is the question: Do we want to make the node into a common neighbour of the group - and therefore connect it to all nodes in the group to which it s not already connected to? Or do we want to cut all ties between the node and the group - and therefore delete all edges between them? The answer depends on the cost of the operation which is the same as the number of edges that need to be added or deleted: Cost remove = Cost add = p i=1 p i=1 { S 1i S 2k if S 1i links to S 2k 0 otherwise { 0 if S 1i links to S 2k S 1i S 2k otherwise Where S 2k is the not common neighbour in question and S 11,..., S 1p are the super nodes of the group that should be merged. In figure 4 we would either need to add the edge (A, T d) for T d or delete the edges (B, T d) and (C, T d). So: Cost remove (T d) = 2 1 = Cost add (T d) Figure 4: Group of nodes to merge and its neighbours Figure 5: Group with changed edges to its neighbours and we add edge (A, T d), as it is cheaper. For S it s the other way round and deleting (C, S) is cheaper than adding edges from S to A and B. The result is shown in figure 5. A special case would be if the adding and the removing cost is the same. Then it would be necessary to look further into the properties of the node to decide if it should be added to the common neighbours or not. The routine ModifyEdge(group, G S, G A ) takes as input such a group of nodes that should be merged, computes their not common neighbours and then removes or adds edges between each of the not common neighbours and the group according to the above cost function. This is done by changing entries in A GS and adding a 1 entry for each changed edge to A GA. 5

7 The Algorithm So far we know how to model a summarization, how to calculate its cost and - if given a group - how to change the edges and merge the group (the merging is a simple relabeling and shrinking of A GS and changing of some names in the list). What s still missing now is how to find these groups. Let s look again at the example of the online movie rating site: the aim is to group similar users and similar movies together. Two users are similar if they like the same movies, so their similarity could be defined as the number of movies they both like. As some users might have only liked 5 movies, two of them that have 5 movies in common are very similar, but for users that have liked about 100 movies, 5 of them in common are not that much. Therefore is the similarity of two users the fraction of their liked movies that both of them liked. More formally: n k=1 sim(s 1i, S 1j ) = S 2k n+m k=1 S (4) 2k Where S 21,..., S 2n denote the common neighbours of S 1i and S 1j, and S 2(n+1),..., S 2(n+m) are the super nodes that are only connected to one of them (not common neighbours). Analogous for super nodes of type 2. The similarity ranges from 0 to 1. If S 1i and S 1j have exactly the same neighbours, m is equal to zero, which makes the upper and the lower term of the above equation the same and the resulting similarity equal to one. On the other hand, if they don t share any neighbours, n is equal to zero and therefore the upper term is zero too, what makes the similarity equal to zero. As the similarity is only non-zero if the nodes have a common neighbour and therefore are hop-2-neighbours of each other - means reachable from each other in two hops - the above similarity gets called hop-2-similarity (hop2sim) in the paper. For the example graph in figure 0 the similarities are as follows: sim(a, B) = 1 2 sim(p, T d) = 2 3 sim(a, C) = 1 3 sim(p, S) = 1 5 sim(b, C) = 2 3 sim(t d, S) = 1 4 sim(c, D) = 1 4 sim(s, T ) = 2 3 sim(c, E) = 1 4 sim(d, E) = 1 Figure 6: The input graph from figure 0 Now, to specify which nodes should be merged, the algorithm uses a threshold th for the similarites. It starts at 1.0, so in figure 6 we would merge D and E. After doing this, the similarities look like in figure 8. As there are no nodes with similarity 1.0 anymore, the threshold has to be reduced by a reduction step ɛ in order to get other groups that can be merged. Combining all the seen steps results in the algorithm on the next page: Figure 7: (top) the hop2sim s from the original graph Figure 8: (bottom) the hop2sim s after merging E and D into super node S 12 sim(a, B) = 1 sim(p, T d) = sim(a, C) = 1 sim(p, S) = sim(b, C) = 2 sim(t d, S) = sim(c, S 12 ) = 1 sim(s, T ) =

8 Algorithm SCMiner, extracted from [1], p.1252 Input: Bipartite graph G = (V, E), Reduce step ɛ Output: Summary Graph G S, Additional Graph G A //Initialization G S = G, G A = (V, ); Compute mincc using Eq.(3) with G S and G A ; bestg S = G S, bestg A = G A ; Compute hop2sim for each S G S using Eq.(4); //Search for best Summarization while th > 0 do for each node S G S do Get SN with S SN and hop2sim(s, S ) > th; end for Combine SN and get non-overlapped groups allgroup; for each group allgroup do ModifyEdge(group, G S, G A ); Merge nodes of G S with same link pattern; Compute cc using Eq.(3) with G S and G A ; Record bestg S, bestg A, and mincc if cc < mincc; end for if allgroup == then th = th ɛ; else th = 1.0; end if end while return bestg A, bestg A ; The inputs are a bipartite Graph G = (V 1, V 2, E), as stated before, and the step size ɛ. The output is the summarization of the graph G represented by the summary graph G S = (S 1, S 2, E ) and the additional graph G A = (V 1, V 2, E ). This summarization has the minimum coding cost subject to the proposed coding schema. The algorithm first initializes G S as G and G A as empty, and sets it as the best solution (as if no better one is found, it is the best). It then computes the minimum coding cost mincc, and the hop2sim for each (super) node S G S. Then it searches iteratively for groups with similarities above a certain threshold th. For that it collects for each node S all hoptwo-neighbours that have a similarity above the threshold and saves them in the set SN. After doing that for all S G S, it merges these sets. The result is a set of non-overlapping groups. For each of these groups then possibly edges have to be added or removed with the ModifyEdge method. At the end of the ModifyEdge method the hop2sim of all not common neighbours have to be recomputed as their edges and therefore their similarities might have changed. After the nodes of the group changed their link pattern to exactly the same, they can be merged into one super node. If the coding cost for this augmented graph is lower than for the currently best summarization graph, it gets set as the best. The threshold starts at 1.0 and if there is no group for this threshold it gets iteratively reduced by ɛ. This makes sure that the nodes with more similarity get merged first. After a group has been found for a threshold th it gets set back to 1.0, to merge the following groups again similarity-wise. Once the threshold reaches zero the algorithm stops. The number of necessary iterations depends on the reduction step ɛ. For a large ɛ more groups of nodes get merged per iteration step than for a smaller ɛ and therefore the threshold reaches zero faster. But on the other hand might a larger ɛ result in a less exact result. According to the authors, the best value for ɛ lies in [0.01, 0.1]. 7

9 Example Execution of the Algorithm SCMiner Let s take the example graph of figure 0 as input graph and set ɛ = 0.5. Then the model and his representation look like this after the initialization: Iteration #0, th = 1.0, mincc = 39.6 A GS = B C E list : A : A B : B C : C D : D E : E P : P T d : T d S : S T : T A GA = A B C sim(a, B) = 1 2 sim(p, T d) = 2 3 sim(a, C) = 1 3 sim(p, S) = 1 5 sim(b, C) = 2 3 sim(t d, S) = 1 4 sim(c, D) = 1 4 sim(s, T ) = 2 3 sim(c, E) = 1 4 sim(d, E) = 1 Iteration #1, th = 1.0, mincc = 39.6 As the threshold is 1.0 the only nodes who s similarity satisfies that are D and E. For merging them in the super nods S 12 we don t need to modify any edges nor recalculate any similarities but just substitute the name. The augmented model looks as follows: A GS = B C S list : A : A B : B C : C D : S 12 P : P T d : T d S : S T : T A GA = A B C E : S 12 sim(a, B) = 1 2 sim(p, T d) = 2 3 sim(a, C) = 1 3 sim(p, S) = 1 5 sim(b, C) = 2 3 sim(t d, S) = 1 4 sim(c, S 12 ) = 1 4 sim(s, T ) = 2 3 The cost for this is 33.6 so we record G S, G A and mincc. As we found a group in this iteration, the threshold gets set to 1.0 (where it already is). Iteration #2, th = 1.0, mincc = 33.6 As there are no nodes with similarity one, allgroup is the empty set and we have nothing to merge. At the end of the iteration we reduce the threshold by the reduction step ɛ. 8

10 Iteration #3, th = 0.5, mincc = 33.6 For the threshold 0.5 we find some node pairs that have a greater or equal similarity: (A, B), (B, C), (P, T d) and (S, T ). Combining them gives us the three groups {A, B, C}, {P, T d} and {S, T }, as illustrated in figure 9. Figure 9: G S with marked pairs (left) and G S with combination of the previous pairs in to non-overlapping groups (right) Iteration #3, group = {A,B,C}, mincc = 33.6 We start with the group {A,B,C}. As A, B and C don t have the same link pattern, we call the routine ModifyEdge({A,B,C}, G S, G A ) and change the edges of the group like in the section Edge Modification. After this we need to update the hop2sim s and then we can merge the nodes into the super node S 11. The cost cc = 30.2 is smaller than mincc, therefore we record G S, G A and mincc again. Because there are still other groups we don t change the threshold yet. A GS = A GA = ( ) S S B C list : A : S 11 P : P B : S 11 T d : T d C : S 11 S : S D : S 12 T : T E : S 12 sim(s 11, S 12 ) = 0 = 0 4 sim(p, T d) = 3 = 1 3 sim(p, S) = 0 = 0 5 sim(t d, S) = 0 = 0 5 sim(s, T ) = 2 = 1 2 9

11 Iteration #3, group = {P,Td}, mincc = 30.2 Next we look at the group {P,Td}. Because of the edges we changed while processing the previous group, these two nodes now have the same link pattern and can be directly merged into the super node S 21. Therefore we only need to relabel and record G S, G A and mincc again, as the cost is cc = A GS = A GA = ( S 21 S T ) S S B C list : A : S 11 P : S 21 B : S 11 T d : S 21 C : S 11 S : S D : S 12 T : T E : S 12 sim(s 11, S 12 ) = 0 sim(s 21, S) = 0 sim(s, T ) = 2 = 1 2 Iteration #3, group = {S,T}, mincc = 26.2 Group {S,T} can be processed analogous to the previous group: same link pattern, so merge them directly in to the super node S 22, then relabel and record G S, G A and mincc, because cc = A GS = A GA = ( S 21 S 22 ) S S B C list : A : S 11 P : S 21 B : S 11 T d : S 21 C : S 11 S : S 22 D : S 12 T : S 22 E : S 12 sim(s 11, S 12 ) = 0 sim(s 21, S 22 ) = 0 After this, we set the threshold to 1.0 because we found a group in this iteration. Iteration #4, th = 1.0, mincc = 22.2 As all similarities between the super nodes are zero, allgroup is the empty set, and we reduce the threshold. Iteration #5, th = 0.5, mincc = 22.2 allgroup is still the empty set, and we reduce the threshold again. 10

12 Iteration #6, th = 0, mincc = 22.2 The threshold is zero, so we don t enter the while loop, but return the recorded bestg S and bestg A. Analysis With N nodes and an average degree d av of each vertex, the runtime complexity to compute the hop two similarity of each node is O(N d av 3 ), as each of the N nodes has on average d av neighbours of the other type, of which each has on average d av neighbours. For each of these hop two neighbours the information of the common or not common neighbours have to be accessed. That is a minimum of d av (if all neighbours are the same) and a maximum of 2 d av 1 on average, therefore needs time O(d av ). So: N (nodes) d av (neighbour nodes of the other type) d av (neighbour nodes of the own type) O(d av ) (time for computing one similarity) = O(N d av 3 ) During the ModifyEdge method in SCMiner, not all similarities change and need to be recomputed but only the ones of the nodes of which edges got deleted or to which edges got added. These are one average d av, so the N above can be replaced by a d av, making the complexity roughly O(N d av 4 ). The number of merging steps is affected by ɛ but N in average, making the whole runtime complexity O(N d av 4 ). As the runtime of the algorithm is heavily dependent on the (re)computation of the similarites, the best case for the above algorithm is, when there is no noise in the input graph. That doesn t mean that the input data is faulty, but noise in the sense of edges that are in the input graph but need to get deleted or added to get the output graph. If there are no unnecessary edges, no edges have to be modified and therefore no similarities have to be recomputed. Additionally, the similarities would all be either one or zero and all merges that need to be done would be done in the first iteration. The worst case on the opposite would be, if each group to merge would consist of the minimal two (super) nodes. This is the case if there are a lot of non-uniform distributed noisy edges, what leads to merging the small groups of more similar nodes first and then gradually build from these the large super nodes in the output. This needs a lot of merging steps and therefore an awful lot of similarity computations. (Naturally this depends on the number of nodes that end up in one super node in the output. If there are two nodes per super node this is not as bad as having a single super node containing all nodes in the output.) Real World Examples One type of examples for the usage of the algorithm mentioned in the paper are websites - to rate movies, jokes or join newsgroups - from which the providers want to collect data. Two other examples also mentioned, were the data set WorldCities, that consists of the distribution of global service firms in the top world cities, and the reactions from proteins to drugs. 11

13 Sources [1] Jing Feng, Xiao He, Bettina Konte, Christian Böhm, Claudia Plant. Summarization-based Mining Bipartite Graphs. In KDD, pages , [2] Information Theory (lecture), ETH Zürich, Hamed Hassani, spring semester

Randomness and Computation March 25, Lecture 5

Randomness and Computation March 25, Lecture 5 0368.463 Randomness and Computation March 25, 2009 Lecturer: Ronitt Rubinfeld Lecture 5 Scribe: Inbal Marhaim, Naama Ben-Aroya Today Uniform generation of DNF satisfying assignments Uniform generation

More information

(Refer Slide Time: 01:00)

(Refer Slide Time: 01:00) Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras Lecture minus 26 Heuristics for TSP In this lecture, we continue our discussion

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Matching Algorithms. Proof. If a bipartite graph has a perfect matching, then it is easy to see that the right hand side is a necessary condition.

Matching Algorithms. Proof. If a bipartite graph has a perfect matching, then it is easy to see that the right hand side is a necessary condition. 18.433 Combinatorial Optimization Matching Algorithms September 9,14,16 Lecturer: Santosh Vempala Given a graph G = (V, E), a matching M is a set of edges with the property that no two of the edges have

More information

Paths, Flowers and Vertex Cover

Paths, Flowers and Vertex Cover Paths, Flowers and Vertex Cover Venkatesh Raman, M.S. Ramanujan, and Saket Saurabh Presenting: Hen Sender 1 Introduction 2 Abstract. It is well known that in a bipartite (and more generally in a Konig)

More information

Introduction to SNNS

Introduction to SNNS Introduction to SNNS Caren Marzban http://www.nhn.ou.edu/ marzban Introduction In this lecture we will learn about a Neural Net (NN) program that I know a little about - the Stuttgart Neural Network Simulator

More information

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search Informal goal Clustering Given set of objects and measure of similarity between them, group similar objects together What mean by similar? What is good grouping? Computation time / quality tradeoff 1 2

More information

Jessica Su (some parts copied from CLRS / last quarter s notes)

Jessica Su (some parts copied from CLRS / last quarter s notes) 1 Max flow Consider a directed graph G with positive edge weights c that define the capacity of each edge. We can identify two special nodes in G: the source node s and the sink node t. We want to find

More information

Pregel. Ali Shah

Pregel. Ali Shah Pregel Ali Shah s9alshah@stud.uni-saarland.de 2 Outline Introduction Model of Computation Fundamentals of Pregel Program Implementation Applications Experiments Issues with Pregel 3 Outline Costs of Computation

More information

GRAPH DECOMPOSITION BASED ON DEGREE CONSTRAINTS. March 3, 2016

GRAPH DECOMPOSITION BASED ON DEGREE CONSTRAINTS. March 3, 2016 GRAPH DECOMPOSITION BASED ON DEGREE CONSTRAINTS ZOÉ HAMEL March 3, 2016 1. Introduction Let G = (V (G), E(G)) be a graph G (loops and multiple edges not allowed) on the set of vertices V (G) and the set

More information

MCL. (and other clustering algorithms) 858L

MCL. (and other clustering algorithms) 858L MCL (and other clustering algorithms) 858L Comparing Clustering Algorithms Brohee and van Helden (2006) compared 4 graph clustering algorithms for the task of finding protein complexes: MCODE RNSC Restricted

More information

Project and Production Management Prof. Arun Kanda Department of Mechanical Engineering Indian Institute of Technology, Delhi

Project and Production Management Prof. Arun Kanda Department of Mechanical Engineering Indian Institute of Technology, Delhi Project and Production Management Prof. Arun Kanda Department of Mechanical Engineering Indian Institute of Technology, Delhi Lecture - 8 Consistency and Redundancy in Project networks In today s lecture

More information

CS261: A Second Course in Algorithms Lecture #16: The Traveling Salesman Problem

CS261: A Second Course in Algorithms Lecture #16: The Traveling Salesman Problem CS61: A Second Course in Algorithms Lecture #16: The Traveling Salesman Problem Tim Roughgarden February 5, 016 1 The Traveling Salesman Problem (TSP) In this lecture we study a famous computational problem,

More information

Information Integration of Partially Labeled Data

Information Integration of Partially Labeled Data Information Integration of Partially Labeled Data Steffen Rendle and Lars Schmidt-Thieme Information Systems and Machine Learning Lab, University of Hildesheim srendle@ismll.uni-hildesheim.de, schmidt-thieme@ismll.uni-hildesheim.de

More information

Recursive-Fib(n) if n=1 or n=2 then return 1 else return Recursive-Fib(n-1)+Recursive-Fib(n-2)

Recursive-Fib(n) if n=1 or n=2 then return 1 else return Recursive-Fib(n-1)+Recursive-Fib(n-2) Dynamic Programming Any recursive formula can be directly translated into recursive algorithms. However, sometimes the compiler will not implement the recursive algorithm very efficiently. When this is

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic

More information

Scanning Real World Objects without Worries 3D Reconstruction

Scanning Real World Objects without Worries 3D Reconstruction Scanning Real World Objects without Worries 3D Reconstruction 1. Overview Feng Li 308262 Kuan Tian 308263 This document is written for the 3D reconstruction part in the course Scanning real world objects

More information

Copyright 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch.

Copyright 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch. Iterative Improvement Algorithm design technique for solving optimization problems Start with a feasible solution Repeat the following step until no improvement can be found: change the current feasible

More information

(Refer Slide Time: 01.26)

(Refer Slide Time: 01.26) Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture # 22 Why Sorting? Today we are going to be looking at sorting.

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Data Mining 4. Cluster Analysis

Data Mining 4. Cluster Analysis Data Mining 4. Cluster Analysis 4.5 Spring 2010 Instructor: Dr. Masoud Yaghini Introduction DBSCAN Algorithm OPTICS Algorithm DENCLUE Algorithm References Outline Introduction Introduction Density-based

More information

Algorithms and Data Structures

Algorithms and Data Structures Algorithms and Data Structures Open Hashing Ulf Leser Open Hashing Open Hashing: Store all values inside hash table A General framework No collision: Business as usual Collision: Chose another index and

More information

Graph Contraction. Graph Contraction CSE341T/CSE549T 10/20/2014. Lecture 14

Graph Contraction. Graph Contraction CSE341T/CSE549T 10/20/2014. Lecture 14 CSE341T/CSE549T 10/20/2014 Lecture 14 Graph Contraction Graph Contraction So far we have mostly talking about standard techniques for solving problems on graphs that were developed in the context of sequential

More information

Examples of P vs NP: More Problems

Examples of P vs NP: More Problems Examples of P vs NP: More Problems COMP1600 / COMP6260 Dirk Pattinson Australian National University Semester 2, 2017 Catch Up / Drop in Lab When Fridays, 15.00-17.00 Where N335, CSIT Building (bldg 108)

More information

Social-Network Graphs

Social-Network Graphs Social-Network Graphs Mining Social Networks Facebook, Google+, Twitter Email Networks, Collaboration Networks Identify communities Similar to clustering Communities usually overlap Identify similarities

More information

ICS 161 Algorithms Winter 1998 Final Exam. 1: out of 15. 2: out of 15. 3: out of 20. 4: out of 15. 5: out of 20. 6: out of 15.

ICS 161 Algorithms Winter 1998 Final Exam. 1: out of 15. 2: out of 15. 3: out of 20. 4: out of 15. 5: out of 20. 6: out of 15. ICS 161 Algorithms Winter 1998 Final Exam Name: ID: 1: out of 15 2: out of 15 3: out of 20 4: out of 15 5: out of 20 6: out of 15 total: out of 100 1. Solve the following recurrences. (Just give the solutions;

More information

Dr. Amotz Bar-Noy s Compendium of Algorithms Problems. Problems, Hints, and Solutions

Dr. Amotz Bar-Noy s Compendium of Algorithms Problems. Problems, Hints, and Solutions Dr. Amotz Bar-Noy s Compendium of Algorithms Problems Problems, Hints, and Solutions Chapter 1 Searching and Sorting Problems 1 1.1 Array with One Missing 1.1.1 Problem Let A = A[1],..., A[n] be an array

More information

Planar Graphs and Surfaces. Graphs 2 1/58

Planar Graphs and Surfaces. Graphs 2 1/58 Planar Graphs and Surfaces Graphs 2 1/58 Last time we discussed the Four Color Theorem, which says that any map can be colored with at most 4 colors and not have two regions that share a border having

More information

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection CSE 255 Lecture 6 Data Mining and Predictive Analytics Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 600.363 Introduction to Algorithms / 600.463 Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 23.1 Introduction We spent last week proving that for certain problems,

More information

BOOLEAN MATRIX FACTORIZATIONS. with applications in data mining Pauli Miettinen

BOOLEAN MATRIX FACTORIZATIONS. with applications in data mining Pauli Miettinen BOOLEAN MATRIX FACTORIZATIONS with applications in data mining Pauli Miettinen MATRIX FACTORIZATIONS BOOLEAN MATRIX FACTORIZATIONS o THE BOOLEAN MATRIX PRODUCT As normal matrix product, but with addition

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

1 Leaffix Scan, Rootfix Scan, Tree Size, and Depth

1 Leaffix Scan, Rootfix Scan, Tree Size, and Depth Lecture 17 Graph Contraction I: Tree Contraction Parallel and Sequential Data Structures and Algorithms, 15-210 (Spring 2012) Lectured by Kanat Tangwongsan March 20, 2012 In this lecture, we will explore

More information

Spectral Methods for Network Community Detection and Graph Partitioning

Spectral Methods for Network Community Detection and Graph Partitioning Spectral Methods for Network Community Detection and Graph Partitioning M. E. J. Newman Department of Physics, University of Michigan Presenters: Yunqi Guo Xueyin Yu Yuanqi Li 1 Outline: Community Detection

More information

CS521 \ Notes for the Final Exam

CS521 \ Notes for the Final Exam CS521 \ Notes for final exam 1 Ariel Stolerman Asymptotic Notations: CS521 \ Notes for the Final Exam Notation Definition Limit Big-O ( ) Small-o ( ) Big- ( ) Small- ( ) Big- ( ) Notes: ( ) ( ) ( ) ( )

More information

1 Definition of Reduction

1 Definition of Reduction 1 Definition of Reduction Problem A is reducible, or more technically Turing reducible, to problem B, denoted A B if there a main program M to solve problem A that lacks only a procedure to solve problem

More information

Notes for Lecture 24

Notes for Lecture 24 U.C. Berkeley CS170: Intro to CS Theory Handout N24 Professor Luca Trevisan December 4, 2001 Notes for Lecture 24 1 Some NP-complete Numerical Problems 1.1 Subset Sum The Subset Sum problem is defined

More information

CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh HW#3 Due at the beginning of class Thursday 03/02/17

CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh HW#3 Due at the beginning of class Thursday 03/02/17 CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh (rezab@stanford.edu) HW#3 Due at the beginning of class Thursday 03/02/17 1. Consider a model of a nonbipartite undirected graph in which

More information

TRANSACTIONAL CLUSTERING. Anna Monreale University of Pisa

TRANSACTIONAL CLUSTERING. Anna Monreale University of Pisa TRANSACTIONAL CLUSTERING Anna Monreale University of Pisa Clustering Clustering : Grouping of objects into different sets, or more precisely, the partitioning of a data set into subsets (clusters), so

More information

New Implementation for the Multi-sequence All-Against-All Substring Matching Problem

New Implementation for the Multi-sequence All-Against-All Substring Matching Problem New Implementation for the Multi-sequence All-Against-All Substring Matching Problem Oana Sandu Supervised by Ulrike Stege In collaboration with Chris Upton, Alex Thomo, and Marina Barsky University of

More information

CPSC 536N: Randomized Algorithms Term 2. Lecture 10

CPSC 536N: Randomized Algorithms Term 2. Lecture 10 CPSC 536N: Randomized Algorithms 011-1 Term Prof. Nick Harvey Lecture 10 University of British Columbia In the first lecture we discussed the Max Cut problem, which is NP-complete, and we presented a very

More information

CSE 158. Web Mining and Recommender Systems. Midterm recap

CSE 158. Web Mining and Recommender Systems. Midterm recap CSE 158 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday! 5:10 pm 6:10 pm Closed book but I ll provide a similar level of basic info as in the last page of previous midterms CSE 158

More information

CS S Lecture February 13, 2017

CS S Lecture February 13, 2017 CS 6301.008.18S Lecture February 13, 2017 Main topics are #Voronoi-diagrams, #Fortune. Quick Note about Planar Point Location Last week, I started giving a difficult analysis of the planar point location

More information

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02)

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02) Programming, Data Structures and Algorithms in Python Prof. Madhavan Mukund Department of Computer Science and Engineering Indian Institute of Technology, Madras Week - 04 Lecture - 01 Merge Sort (Refer

More information

Paths, Flowers and Vertex Cover

Paths, Flowers and Vertex Cover Paths, Flowers and Vertex Cover Venkatesh Raman M. S. Ramanujan Saket Saurabh Abstract It is well known that in a bipartite (and more generally in a König) graph, the size of the minimum vertex cover is

More information

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret

Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Greedy Algorithms (continued) The best known application where the greedy algorithm is optimal is surely

More information

Introduction to Graph Theory

Introduction to Graph Theory Introduction to Graph Theory Tandy Warnow January 20, 2017 Graphs Tandy Warnow Graphs A graph G = (V, E) is an object that contains a vertex set V and an edge set E. We also write V (G) to denote the vertex

More information

Lecture 16 October 23, 2014

Lecture 16 October 23, 2014 CS 224: Advanced Algorithms Fall 2014 Prof. Jelani Nelson Lecture 16 October 23, 2014 Scribe: Colin Lu 1 Overview In the last lecture we explored the simplex algorithm for solving linear programs. While

More information

LECTURES 3 and 4: Flows and Matchings

LECTURES 3 and 4: Flows and Matchings LECTURES 3 and 4: Flows and Matchings 1 Max Flow MAX FLOW (SP). Instance: Directed graph N = (V,A), two nodes s,t V, and capacities on the arcs c : A R +. A flow is a set of numbers on the arcs such that

More information

7. Decision or classification trees

7. Decision or classification trees 7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,

More information

Algorithms Exam TIN093/DIT600

Algorithms Exam TIN093/DIT600 Algorithms Exam TIN093/DIT600 Course: Algorithms Course code: TIN 093 (CTH), DIT 600 (GU) Date, time: 24th October 2015, 14:00 18:00 Building: M Responsible teacher: Peter Damaschke, Tel. 5405. Examiner:

More information

Lecture 4: Primal Dual Matching Algorithm and Non-Bipartite Matching. 1 Primal/Dual Algorithm for weighted matchings in Bipartite Graphs

Lecture 4: Primal Dual Matching Algorithm and Non-Bipartite Matching. 1 Primal/Dual Algorithm for weighted matchings in Bipartite Graphs CMPUT 675: Topics in Algorithms and Combinatorial Optimization (Fall 009) Lecture 4: Primal Dual Matching Algorithm and Non-Bipartite Matching Lecturer: Mohammad R. Salavatipour Date: Sept 15 and 17, 009

More information

Lecture 22 - Oblivious Transfer (OT) and Private Information Retrieval (PIR)

Lecture 22 - Oblivious Transfer (OT) and Private Information Retrieval (PIR) Lecture 22 - Oblivious Transfer (OT) and Private Information Retrieval (PIR) Boaz Barak December 8, 2005 Oblivious Transfer We are thinking of the following situation: we have a server and a client (or

More information

Exact Algorithms Lecture 7: FPT Hardness and the ETH

Exact Algorithms Lecture 7: FPT Hardness and the ETH Exact Algorithms Lecture 7: FPT Hardness and the ETH February 12, 2016 Lecturer: Michael Lampis 1 Reminder: FPT algorithms Definition 1. A parameterized problem is a function from (χ, k) {0, 1} N to {0,

More information

arxiv:cs/ v1 [cs.ds] 20 Feb 2003

arxiv:cs/ v1 [cs.ds] 20 Feb 2003 The Traveling Salesman Problem for Cubic Graphs David Eppstein School of Information & Computer Science University of California, Irvine Irvine, CA 92697-3425, USA eppstein@ics.uci.edu arxiv:cs/0302030v1

More information

Move-to-front algorithm

Move-to-front algorithm Up to now, we have looked at codes for a set of symbols in an alphabet. We have also looked at the specific case that the alphabet is a set of integers. We will now study a few compression techniques in

More information

Lecturers: Sanjam Garg and Prasad Raghavendra March 20, Midterm 2 Solutions

Lecturers: Sanjam Garg and Prasad Raghavendra March 20, Midterm 2 Solutions U.C. Berkeley CS70 : Algorithms Midterm 2 Solutions Lecturers: Sanjam Garg and Prasad aghavra March 20, 207 Midterm 2 Solutions. (0 points) True/False Clearly put your answers in the answer box in front

More information

PERFECT MATCHING THE CENTRALIZED DEPLOYMENT MOBILE SENSORS THE PROBLEM SECOND PART: WIRELESS NETWORKS 2.B. SENSOR NETWORKS OF MOBILE SENSORS

PERFECT MATCHING THE CENTRALIZED DEPLOYMENT MOBILE SENSORS THE PROBLEM SECOND PART: WIRELESS NETWORKS 2.B. SENSOR NETWORKS OF MOBILE SENSORS SECOND PART: WIRELESS NETWORKS 2.B. SENSOR NETWORKS THE CENTRALIZED DEPLOYMENT OF MOBILE SENSORS I.E. THE MINIMUM WEIGHT PERFECT MATCHING 1 2 ON BIPARTITE GRAPHS Prof. Tiziana Calamoneri Network Algorithms

More information

Matchings in Graphs. Definition 1 Let G = (V, E) be a graph. M E is called as a matching of G if v V we have {e M : v is incident on e E} 1.

Matchings in Graphs. Definition 1 Let G = (V, E) be a graph. M E is called as a matching of G if v V we have {e M : v is incident on e E} 1. Lecturer: Scribe: Meena Mahajan Rajesh Chitnis Matchings in Graphs Meeting: 1 6th Jan 010 Most of the material in this lecture is taken from the book Fast Parallel Algorithms for Graph Matching Problems

More information

Combinatorial Optimization

Combinatorial Optimization Combinatorial Optimization Frank de Zeeuw EPFL 2012 Today Introduction Graph problems - What combinatorial things will we be optimizing? Algorithms - What kind of solution are we looking for? Linear Programming

More information

15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018

15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018 15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018 In this lecture, we describe a very general problem called linear programming

More information

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22

INF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22 INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task

More information

Evolutionary Algorithms

Evolutionary Algorithms Evolutionary Algorithms Proposal for a programming project for INF431, Spring 2014 version 14-02-19+23:09 Benjamin Doerr, LIX, Ecole Polytechnique Difficulty * *** 1 Synopsis This project deals with the

More information

Modularity CMSC 858L

Modularity CMSC 858L Modularity CMSC 858L Module-detection for Function Prediction Biological networks generally modular (Hartwell+, 1999) We can try to find the modules within a network. Once we find modules, we can look

More information

6. Lecture notes on matroid intersection

6. Lecture notes on matroid intersection Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans May 2, 2017 6. Lecture notes on matroid intersection One nice feature about matroids is that a simple greedy algorithm

More information

Rectangular Partitioning

Rectangular Partitioning Rectangular Partitioning Joe Forsmann and Rock Hymas Introduction/Abstract We will look at a problem that I (Rock) had to solve in the course of my work. Given a set of non-overlapping rectangles each

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Algorithms and Game Theory Date: 12/3/15

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Algorithms and Game Theory Date: 12/3/15 600.363 Introduction to Algorithms / 600.463 Algorithms I Lecturer: Michael Dinitz Topic: Algorithms and Game Theory Date: 12/3/15 25.1 Introduction Today we re going to spend some time discussing game

More information

reductions Nathan

reductions Nathan L25 reductions 4102 11.21.2013 Nathan 1 a new technique for algorithm design 2 MergeSort(n) MergeSort(n/2) MergeSort(n/2) Merge(left,right) 3 MergeSort(n)

More information

MAS 341: GRAPH THEORY 2016 EXAM SOLUTIONS

MAS 341: GRAPH THEORY 2016 EXAM SOLUTIONS MS 41: PH THEOY 2016 EXM SOLUTIONS 1. Question 1 1.1. Explain why any alkane C n H 2n+2 is a tree. How many isomers does C 6 H 14 have? Draw the structure of the carbon atoms in each isomer. marks; marks

More information

COT 6936: Topics in Algorithms! Giri Narasimhan. ECS 254A / EC 2443; Phone: x3748

COT 6936: Topics in Algorithms! Giri Narasimhan. ECS 254A / EC 2443; Phone: x3748 COT 6936: Topics in Algorithms! Giri Narasimhan ECS 254A / EC 2443; Phone: x3748 giri@cs.fiu.edu http://www.cs.fiu.edu/~giri/teach/cot6936_s12.html https://moodle.cis.fiu.edu/v2.1/course/view.php?id=174

More information

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents E-Companion: On Styles in Product Design: An Analysis of US Design Patents 1 PART A: FORMALIZING THE DEFINITION OF STYLES A.1 Styles as categories of designs of similar form Our task involves categorizing

More information

Programming and Data Structure

Programming and Data Structure Programming and Data Structure Dr. P.P.Chakraborty Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture # 09 Problem Decomposition by Recursion - II We will

More information

COMP260 Spring 2014 Notes: February 4th

COMP260 Spring 2014 Notes: February 4th COMP260 Spring 2014 Notes: February 4th Andrew Winslow In these notes, all graphs are undirected. We consider matching, covering, and packing in bipartite graphs, general graphs, and hypergraphs. We also

More information

Non Overlapping Communities

Non Overlapping Communities Non Overlapping Communities Davide Mottin, Konstantina Lazaridou HassoPlattner Institute Graph Mining course Winter Semester 2016 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides

More information

SFU CMPT Lecture: Week 8

SFU CMPT Lecture: Week 8 SFU CMPT-307 2008-2 1 Lecture: Week 8 SFU CMPT-307 2008-2 Lecture: Week 8 Ján Maňuch E-mail: jmanuch@sfu.ca Lecture on June 24, 2008, 5.30pm-8.20pm SFU CMPT-307 2008-2 2 Lecture: Week 8 Universal hashing

More information

/ Approximation Algorithms Lecturer: Michael Dinitz Topic: Linear Programming Date: 2/24/15 Scribe: Runze Tang

/ Approximation Algorithms Lecturer: Michael Dinitz Topic: Linear Programming Date: 2/24/15 Scribe: Runze Tang 600.469 / 600.669 Approximation Algorithms Lecturer: Michael Dinitz Topic: Linear Programming Date: 2/24/15 Scribe: Runze Tang 9.1 Linear Programming Suppose we are trying to approximate a minimization

More information

Stanford University CS261: Optimization Handout 1 Luca Trevisan January 4, 2011

Stanford University CS261: Optimization Handout 1 Luca Trevisan January 4, 2011 Stanford University CS261: Optimization Handout 1 Luca Trevisan January 4, 2011 Lecture 1 In which we describe what this course is about and give two simple examples of approximation algorithms 1 Overview

More information

CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh HW#3 Due at the beginning of class Thursday 02/26/15

CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh HW#3 Due at the beginning of class Thursday 02/26/15 CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh (rezab@stanford.edu) HW#3 Due at the beginning of class Thursday 02/26/15 1. Consider a model of a nonbipartite undirected graph in which

More information

Bipartite Perfect Matching in O(n log n) Randomized Time. Nikhil Bhargava and Elliot Marx

Bipartite Perfect Matching in O(n log n) Randomized Time. Nikhil Bhargava and Elliot Marx Bipartite Perfect Matching in O(n log n) Randomized Time Nikhil Bhargava and Elliot Marx Background Matching in bipartite graphs is a problem that has many distinct applications. Many problems can be reduced

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mining Massive Datasets Jure Leskovec, Stanford University http://cs46.stanford.edu /7/ Jure Leskovec, Stanford C46: Mining Massive Datasets Many real-world problems Web Search and Text Mining Billions

More information

1 Matchings in Graphs

1 Matchings in Graphs Matchings in Graphs J J 2 J 3 J 4 J 5 J J J 6 8 7 C C 2 C 3 C 4 C 5 C C 7 C 8 6 J J 2 J 3 J 4 J 5 J J J 6 8 7 C C 2 C 3 C 4 C 5 C C 7 C 8 6 Definition Two edges are called independent if they are not adjacent

More information

Introduction to Algorithms

Introduction to Algorithms Lecture 1 Introduction to Algorithms 1.1 Overview The purpose of this lecture is to give a brief overview of the topic of Algorithms and the kind of thinking it involves: why we focus on the subjects that

More information

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Module 02 Lecture - 45 Memoization

Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute. Module 02 Lecture - 45 Memoization Design and Analysis of Algorithms Prof. Madhavan Mukund Chennai Mathematical Institute Module 02 Lecture - 45 Memoization Let us continue our discussion of inductive definitions. (Refer Slide Time: 00:05)

More information

Depth First Search A B C D E F G A B C 5 D E F 3 2 G 2 3

Depth First Search A B C D E F G A B C 5 D E F 3 2 G 2 3 Depth First Search A B C D E F G A 4 3 2 B 4 5 4 3 C 5 D 3 4 2 E 2 2 3 F 3 2 G 2 3 Minimum (Weight) Spanning Trees Let G be a graph with weights on the edges. We define the weight of any subgraph of G

More information

CSE 417 Network Flows (pt 3) Modeling with Min Cuts

CSE 417 Network Flows (pt 3) Modeling with Min Cuts CSE 417 Network Flows (pt 3) Modeling with Min Cuts Reminders > HW6 is due on Friday start early bug fixed on line 33 of OptimalLineup.java: > change true to false Review of last two lectures > Defined

More information

Greedy Algorithms 1. For large values of d, brute force search is not feasible because there are 2 d

Greedy Algorithms 1. For large values of d, brute force search is not feasible because there are 2 d Greedy Algorithms 1 Simple Knapsack Problem Greedy Algorithms form an important class of algorithmic techniques. We illustrate the idea by applying it to a simplified version of the Knapsack Problem. Informally,

More information

Commando: Solution. Solution 3 O(n): Consider two decisions i<j, we choose i instead of j if and only if : A S j S i

Commando: Solution. Solution 3 O(n): Consider two decisions i<j, we choose i instead of j if and only if : A S j S i Commando: Solution Commando: Solution Solution 1 O(n 3 ): Using dynamic programming, let f(n) indicate the maximum battle effectiveness after adjustment. We have transfer equations below: n f(n) = max

More information

MA 1128: Lecture 02 1/22/2018

MA 1128: Lecture 02 1/22/2018 MA 1128: Lecture 02 1/22/2018 Exponents Scientific Notation 1 Exponents Exponents are used to indicate how many copies of a number are to be multiplied together. For example, I like to deal with the signs

More information

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore Module No. # 10 Lecture No. # 16 Machine-Independent Optimizations Welcome to the

More information

CS125 : Introduction to Computer Science. Lecture Notes #38 and #39 Quicksort. c 2005, 2003, 2002, 2000 Jason Zych

CS125 : Introduction to Computer Science. Lecture Notes #38 and #39 Quicksort. c 2005, 2003, 2002, 2000 Jason Zych CS125 : Introduction to Computer Science Lecture Notes #38 and #39 Quicksort c 2005, 2003, 2002, 2000 Jason Zych 1 Lectures 38 and 39 : Quicksort Quicksort is the best sorting algorithm known which is

More information

Tutorial for Algorithm s Theory Problem Set 5. January 17, 2013

Tutorial for Algorithm s Theory Problem Set 5. January 17, 2013 Tutorial for Algorithm s Theory Problem Set 5 January 17, 2013 Exercise 1: Maximum Flow Algorithms Consider the following flow network: a) Solve the maximum flow problem on the above network by using the

More information

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions... Contents Contents...283 Introduction...283 Basic Steps in Query Processing...284 Introduction...285 Transformation of Relational Expressions...287 Equivalence Rules...289 Transformation Example: Pushing

More information

Hierarchical Clustering of Process Schemas

Hierarchical Clustering of Process Schemas Hierarchical Clustering of Process Schemas Claudia Diamantini, Domenico Potena Dipartimento di Ingegneria Informatica, Gestionale e dell'automazione M. Panti, Università Politecnica delle Marche - via

More information

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of

More information

Clustering Algorithms for general similarity measures

Clustering Algorithms for general similarity measures Types of general clustering methods Clustering Algorithms for general similarity measures general similarity measure: specified by object X object similarity matrix 1 constructive algorithms agglomerative

More information

Mathematical and Algorithmic Foundations Linear Programming and Matchings

Mathematical and Algorithmic Foundations Linear Programming and Matchings Adavnced Algorithms Lectures Mathematical and Algorithmic Foundations Linear Programming and Matchings Paul G. Spirakis Department of Computer Science University of Patras and Liverpool Paul G. Spirakis

More information

Section 2.0: Getting Started

Section 2.0: Getting Started Solving Linear Equations: Graphically Tabular/Numerical Solution Algebraically Section 2.0: Getting Started Example #1 on page 128. Solve the equation 3x 9 = 3 graphically. Intersection X=4 Y=3 We are

More information

6 Randomized rounding of semidefinite programs

6 Randomized rounding of semidefinite programs 6 Randomized rounding of semidefinite programs We now turn to a new tool which gives substantially improved performance guarantees for some problems We now show how nonlinear programming relaxations can

More information

6 ROUTING PROBLEMS VEHICLE ROUTING PROBLEMS. Vehicle Routing Problem, VRP:

6 ROUTING PROBLEMS VEHICLE ROUTING PROBLEMS. Vehicle Routing Problem, VRP: 6 ROUTING PROBLEMS VEHICLE ROUTING PROBLEMS Vehicle Routing Problem, VRP: Customers i=1,...,n with demands of a product must be served using a fleet of vehicles for the deliveries. The vehicles, with given

More information

Maximizing edge-ratio is NP-complete

Maximizing edge-ratio is NP-complete Maximizing edge-ratio is NP-complete Steven D Noble, Pierre Hansen and Nenad Mladenović February 7, 01 Abstract Given a graph G and a bipartition of its vertices, the edge-ratio is the minimum for both

More information