SEMINAR: GRAPH-BASED METHODS FOR NLP
|
|
- Piers Houston
- 6 years ago
- Views:
Transcription
1 SEMINAR: GRAPH-BASED METHODS FOR NLP Organisatorisches: Seminar findet komplett im Mai statt Seminarausarbeitungen bis 15. Juli (?) Hilfen Seminarvortrag / Ausarbeitung auf der Webseite Tucan number for registra1on: se 1
2 Fahrplan 2
3 3
4 Mo#va#on for graph representa#on Graphs are an intui1ve and natural way to encode en##es (e.g. language units) as nodes and their rela#ons (e.g. similari1es) as edges (directed / undirected) feature- based representa1on can be transformed into a graph via a similarity measure graphs may not necessarily be transformed back into a feature representa1on (at least not a unique one). Think of e.g. points in n- dimensional space. Graph isomorphism 4
5 Graph representa#ons Adjacency Matrix ì î Adjacency List Additional information such as weights might be saved easily. 5
6 Mo#va#on for graph representa#on There exist efficient algorithms that directly operate on graphs 6
7 ? P = NP Efficient Algorithms? 7
8 Efficient Algorithms! There are efficient (polynomial) algorithms for the exact solu1on of many problems on graphs, e.g. Graph Traversal (DFS, Shortest Paths, Max- Capacity Paths, ) Op1mal Trees and Branchings (MST, MAX- FOREST, MAX- BRANCHING, ) Graph Clustering (Min- Cut, Markow Clustering, Chinese Whispers, ) Graph Ranking (PageRank, Random Walks, Markow Chain Theory) Graph Distances (local: Paths, global: Graph Edit Distance, ) Flows on Graphs (MAX- FLOW, MIN- COST FLOW, ) Matching and Assignment (Hungarian Method, Edmond s Algorithm) many more 8
9 Efficient Algorithms! There are efficient approxima#on algorithms and heuris#cs for the approximate solu1on of many graphs problems, e.g. Subgraph Problems (Dense Subgraphs, Minors, ) Op1mal Tour Problems (TSP, PCTSP, VRP, ) Steiner Trees many more There are simple heuris#cs that o^en yield quite good results, such as for example k- OPT for the Euclidean TSP. 9
10 Why efficiency is crucial Graphs are usually large- scale In 2008, English Wikipedia used to have ar1cles* with links in between Graphs are usually dense and strongly connected The largest "strongly- connected- component" of Wikipedia has ar1cles. Remember from the last lecture Graphs in NLP are usually scale- free and have the small world property (high clustering coefficient) à Problem solu1ons o^en consider only small subgraphs (local neighborhoods), but an a priori par11oning is usually not possible (this yields small 1me complexity but full space complexity) * by today there are almost 4 million ar1cles 10
11 PageRank First- genera1on Google global ranking algorithm (1998) Measure the (query- independent) importance of Web page based solely on the link structure. Assign each node a numerical score between 0 and 1, its PageRank. Rank Web pages based on PageRank values. General Idea: every page has a number of in- links (back links) and out- links (forward links) pages with more in- links are more important in- links from important pages are more important 11
12 PageRank 12
13 Defini#on of PageRank u: a web page, R(u) its page rank F u : set of pages u points to (forward links) B u : set of pages that point to u (backw. Links) F u : the number of links from u N: total number of pages d: damping factor, default d=0.85 R(u) = R(v) " F # d + v v!b u page B (1$ d) N The equa1on is recursive, but it may be computed by star1ng with any set of ranks and itera1ng the computa1on un1l it converges. Rank sink problem: cycle of pages that accumulates rank within the cycle, but never distributes rank outside Need damping: uniform rank distribu1on for all pages page X page A page D page C 13
14 Random Surfer Model When normalizing PageRank over all pages to 1, R(u) can be thought of as the probability that a random surfer looks at a page u. Damping corresponds to teleporta1on : With some probability d, the random surfer is teleported to some other page page B page X page A page C page D 14
15 Computation of PageRank Numeric: Simulate a lot of random surfers: The Power method of Eigenvector computation initialize all pages with the same rank repeat until convergence: for all pages u: compute R t+1 (u) on the basis of R t (v) t:=t+1 input : matrix size N, error tolerance ϵ output: eigenvector p p 0 = 1/N 1 t=0; repeat until δ < ϵ: t=t+1; p t = M T p t 1 ; δ = p t p t 1 ; return p t ; 15
16 LexRank: Applica#on to Mul#- Document Summariza#on Mul2- document summariza2on task: 1. iden1fy important topics of the documents to be summarized 2. iden1fy sentences belonging to a certain topic 3. from these sentences belonging to the same topic, select the ones that best describe the topic 4. concatenate sentences from different topics and make sure they fit together Consider sub- problem 3: Input: Sentences that talk about more or less the same thing Output: Scores for those sentences that reflect how well a single sentence represents that topic Solu#on idea: use measures on sentence similarity graph 16
17 From Sentences to TF*IDF vectors TF: count w 1..w n TF*IDF Sentence w 1 w 2 w 3 w n w 1 w 2 w 3 w n This is a sentence that talks about some topic And here is another sentence that talks abot something slightly different. And here is yet another one of these notorious sentences DF ! total number of sentences$ IDF(w) = log# & " DF(w) % 0 feature vector of the second sentence This is the same as the vector space model for Informa1on Retrieval
18 From TF*IDF vectors to sentence similarity graph Sentence similarity graph: nodes: sentences edges: cosine similarity between sentence feature vectors Can apply threshold on similarity or use similarity as edge weight 18
19 Measures: Centroid, Degree and Centrality Centroid Idea: select an average sentence. Compute average point of sentence vectors (centroid) select sentence that is most similar to the centroid for summariza1on Degree Centrality Idea: sentences that cover most of the content have a high node degree (number of edges): since word overlap is responsible for edges, node degree measures word overlap with the overall set of sentences for summariza1on, choose the sentence with the highest degree LexRank Centrality Idea: it does not suffice to be similar to many sentences: similarity to important sentences counts more. normalize the adjacency sentence similarity to make it a stochas1c matrix run PageRank to obtain scores that are used for ranking the sentences for summariza1on, choose sentence with highest score 19
20 Evalua#on of graph- based mul#- document summariza#on Scores: ROUGE metric: similar to BLEU, between manual summaries and system summaries random baseline: select any sentence from set by chance lead- based: select based on posi1on of sentence within document è LexRank is a simple method for genng high scores. It uses the whole structure of the graph, as opposed to Centroid or Degree. This technique also works well for single- document summariza1on. 20
21 TextRank for Keyword Extrac#on Keyword extrac#on: find the most salient keywords for a document Keyword extrac#on with PageRank: preprocess document: iden1fy adjec1ves and nouns as targets target co- occurrence graph: targets co- occurring within a window of 2-10 words apply PageRank to get ranking scores on nodes select highest scoring keywords, possibly concatenate ADJ- NOUN- NOUN sequences if present in the text 21
22 Keyword Extrac#on Evalua#on Comparison: Supervised system that is trained on manually assigned keywords, using frequency and contextual features Note that TextRank is unsupervised: no training necessary 22
23 Graph Clustering Task: Find meaningful groups of nodes in graph by cunng edges Intui1on: Connectedness within a cluster is higher than between clusters Many graph clustering algorithms find the number of clusters automa1cally
24 Clustering by Min- Cut / Max- Flow MinCut algorithm: hierarchical top- down clustering compute the minimum cut: leaving out a set of edges, which results in disconnec1ng a set of nodes from another, with the smallest edge weight sum recursively apply to the components that got disconnected Finding the minimum cut is equivalent to finding the maximum flow in a network Advantage: Efficient. Fastest known algorithm of per- cut complexity O( E +log 3 ( V ) Disadvantage: Unbalanced cuts when to stop? 24
25 Markov Chain Clustering Clustering based on random walks: MCL is the parallel simula1on of all possible random walks up to a finite length on a graph G Idea: a random walker on the graph is more likely to stay within the same cluster than to end up in a different cluster a[er a small number of steps Algorithm: can show convergence to a limit T Add loops: transition matrix T= column-normalize (A G + I) MCL process: alternate between T=T t // expansion: raise T to its power of t T=inflate(T) // inflation: increase contrast within columns by raising values to their power of s (s>0) and normalize column-wise Interpret T as a clustering: use strongest connection as label Stijn van Dongen, Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht, May
26 Expansion step: simulate the random walk (stochas1c) adjacency matrix T: probabili1es to walk from node in column to node in row in a single step. T 2 : probabili1es to walk from A to B in 2 steps. A G loops added T T 2 26
27 Infla#on Step: only keep a]ractors x 2 x 2 norm alize x 2 Inflate the differences within a column by taking the k- th power of the value, then normalize to ensure stochas1c property. k regulates the cluster sizes Clustering: Highest entry in column vector is cluster label variants: Could add small random noise to break 1es Op1miza1on: Only keep K largest values, only keep values over threshold 27
28 Chinese Whispers Graph Clustering MCL: keep only a few strong neighbors Chinese Whispers: only propagate strongest label in neighborhood initialize: "forall v i in V: class(v i )=i;" Nodes have a class and while changes:" communicate it to their forall v in V, randomized order:" adjacent nodes "class(v)=highest ranked class in neighborhood of v;" B L4 deg=2 C L3 deg= A L1 3 deg=4 D L2 deg=1 E L3 deg=3 28 A node adopts one of the the majority class in its neighbourhood Nodes are processed in random order for some itera1ons Node weigh1ng schemes
29 Disambigua#on using Resource Graphs 29
30 Disambigua#on of Named En##es using Resource Graphs Wikipedia Link Graph (Shortest) paths are one possibility 30
31 Disambigua#on of Named En##es using Resource Graphs (Shortest) paths are one possibility. What else? maximum capacity paths (capaci1es needed, e.g. coherence, probabili1es,...) maximum flows (Aten1on: Small world graph! Path length must be bounded!) apply PageRank to weight nodes Semantic enrichment: Use the nodes on the paths / flows for enriching to overcome the knowledge acquisition bottleneck 31
32 Summary on Graph Methods in NLP Graph representa1on is a natural representa1on of en11es and their rela1ons We might use well- known (efficient) graph algorithms for the solu1on of specific NLP problems Taking the overall structure into account some NLP tasks might be improved (enriching seman1cs) Graph clustering algorithms solve unsupervised NLP tasks without the need to specify the number of clusters We can enrich informa1on by walks on graphs 32
Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge
Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum
More informationAutomatic Summarization
Automatic Summarization CS 769 Guest Lecture Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of Wisconsin, Madison February 22, 2008 Andrew B. Goldberg (CS Dept) Summarization
More informationAnalysis of Biological Networks. 1. Clustering 2. Random Walks 3. Finding paths
Analysis of Biological Networks 1. Clustering 2. Random Walks 3. Finding paths Problem 1: Graph Clustering Finding dense subgraphs Applications Identification of novel pathways, complexes, other modules?
More informationCS 6140: Machine Learning Spring 2017
CS 6140: Machine Learning Spring 2017 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis@cs Grades
More informationInforma/on Retrieval. Text Search. CISC437/637, Lecture #23 Ben CartereAe. Consider a database consis/ng of long textual informa/on fields
Informa/on Retrieval CISC437/637, Lecture #23 Ben CartereAe Copyright Ben CartereAe 1 Text Search Consider a database consis/ng of long textual informa/on fields News ar/cles, patents, web pages, books,
More informationMCL. (and other clustering algorithms) 858L
MCL (and other clustering algorithms) 858L Comparing Clustering Algorithms Brohee and van Helden (2006) compared 4 graph clustering algorithms for the task of finding protein complexes: MCODE RNSC Restricted
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationPPI Network Alignment Advanced Topics in Computa8onal Genomics
PPI Network Alignment 02-715 Advanced Topics in Computa8onal Genomics PPI Network Alignment Compara8ve analysis of PPI networks across different species by aligning the PPI networks Find func8onal orthologs
More informationCS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #21: Graph Mining 2
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #21: Graph Mining 2 Networks & Communi>es We o@en think of networks being organized into modules, cluster, communi>es: VT CS 5614 2 Goal:
More informationEinführung in Web und Data Science Community Analysis. Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme
Einführung in Web und Data Science Community Analysis Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Today s lecture Anchor text Link analysis for ranking Pagerank and variants
More informationCS6200 Information Retreival. The WebGraph. July 13, 2015
CS6200 Information Retreival The WebGraph The WebGraph July 13, 2015 1 Web Graph: pages and links The WebGraph describes the directed links between pages of the World Wide Web. A directed edge connects
More informationInforma(on Retrieval
Introduc)on to Informa)on Retrieval CS3245 Informa(on Retrieval Lecture 7: Scoring, Term Weigh9ng and the Vector Space Model 7 Last Time: Index Compression Collec9on and vocabulary sta9s9cs: Heaps and
More informationInforma(on Retrieval
Introduc)on to Informa)on Retrieval CS3245 Informa(on Retrieval Lecture 7: Scoring, Term Weigh9ng and the Vector Space Model 7 Last Time: Index Construc9on Sort- based indexing Blocked Sort- Based Indexing
More informationExtractive Text Summarization Techniques
Extractive Text Summarization Techniques Tobias Elßner Hauptseminar NLP Tools 06.02.2018 Tobias Elßner Extractive Text Summarization Overview Rough classification (Gupta and Lehal (2010)): Supervised vs.
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationBrief (non-technical) history
Web Data Management Part 2 Advanced Topics in Database Management (INFSCI 2711) Textbooks: Database System Concepts - 2010 Introduction to Information Retrieval - 2008 Vladimir Zadorozhny, DINS, SCI, University
More informationLecture #3: PageRank Algorithm The Mathematics of Google Search
Lecture #3: PageRank Algorithm The Mathematics of Google Search We live in a computer era. Internet is part of our everyday lives and information is only a click away. Just open your favorite search engine,
More informationCOMP 4601 Hubs and Authorities
COMP 4601 Hubs and Authorities 1 Motivation PageRank gives a way to compute the value of a page given its position and connectivity w.r.t. the rest of the Web. Is it the only algorithm: No! It s just one
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationLecture 7: Spectral Clustering; Linear Dimensionality Reduc:on via Principal Component Analysis
Lecture 7: Spectral Clustering; Linear Dimensionality Reduc:on via Principal Component Analysis Lester Mackey April, Stats 6B: Unsupervised Learning Blackboard discussion See lecture notes Spectral clustering
More informationMining Social Network Graphs
Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be
More informationClustering. Barna Saha
Clustering Barna Saha The Problem of Clustering Given a set of points, with a no;on of distance between points, group the points into some number of clusters, so that members of a cluster are close to
More informationDSCI 575: Advanced Machine Learning. PageRank Winter 2018
DSCI 575: Advanced Machine Learning PageRank Winter 2018 http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf Web Search before Google Unsupervised Graph-Based Ranking We want to rank importance based on
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 2
Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical
More informationInforma(on Retrieval
Introduc*on to Informa(on Retrieval CS276: Informa*on Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 12: Clustering Today s Topic: Clustering Document clustering Mo*va*ons Document
More informationBruno Martins. 1 st Semester 2012/2013
Link Analysis Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 4
More informationSearch Engines. Informa1on Retrieval in Prac1ce. Annotations by Michael L. Nelson
Search Engines Informa1on Retrieval in Prac1ce Annotations by Michael L. Nelson All slides Addison Wesley, 2008 Indexes Indexes are data structures designed to make search faster Text search has unique
More informationLink Analysis and Web Search
Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html
More informationCS60092: Informa0on Retrieval
Introduc)on to CS60092: Informa0on Retrieval Sourangshu Bha1acharya Today s lecture hypertext and links We look beyond the content of documents We begin to look at the hyperlinks between them Address ques)ons
More informationWeb search before Google. (Taken from Page et al. (1999), The PageRank Citation Ranking: Bringing Order to the Web.)
' Sta306b May 11, 2012 $ PageRank: 1 Web search before Google (Taken from Page et al. (1999), The PageRank Citation Ranking: Bringing Order to the Web.) & % Sta306b May 11, 2012 PageRank: 2 Web search
More informationExtracting Information from Complex Networks
Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform
More informationCollaborative filtering based on a random walk model on a graph
Collaborative filtering based on a random walk model on a graph Marco Saerens, Francois Fouss, Alain Pirotte, Luh Yen, Pierre Dupont (UCL) Jean-Michel Renders (Xerox Research Europe) Some recent methods:
More informationCS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University
CS6200 Informa.on Retrieval David Smith College of Computer and Informa.on Science Northeastern University Indexing Process Indexes Indexes are data structures designed to make search faster Text search
More informationInforma(on Retrieval
Introduc*on to Informa(on Retrieval Clustering Chris Manning, Pandu Nayak, and Prabhakar Raghavan Today s Topic: Clustering Document clustering Mo*va*ons Document representa*ons Success criteria Clustering
More informationCS224W: Analysis of Networks Jure Leskovec, Stanford University
CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 Observations Models
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationInformation Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.
Introduction to Information Retrieval Ethan Phelps-Goodman Some slides taken from http://www.cs.utexas.edu/users/mooney/ir-course/ Information Retrieval (IR) The indexing and retrieval of textual documents.
More informationPlagiarism Detection Using FP-Growth Algorithm
Northeastern University NLP Project Report Plagiarism Detection Using FP-Growth Algorithm Varun Nandu (nandu.v@husky.neu.edu) Suraj Nair (nair.sur@husky.neu.edu) Supervised by Dr. Lu Wang December 10,
More informationLink analysis. Query-independent ordering. Query processing. Spamming simple popularity
Today s topic CS347 Link-based ranking in web search engines Lecture 6 April 25, 2001 Prabhakar Raghavan Web idiosyncrasies Distributed authorship Millions of people creating pages with their own style,
More informationSingle link clustering: 11/7: Lecture 18. Clustering Heuristics 1
Graphs and Networks Page /7: Lecture 8. Clustering Heuristics Wednesday, November 8, 26 8:49 AM Today we will talk about clustering and partitioning in graphs, and sometimes in data sets. Partitioning
More informationFixed- Parameter Evolu2onary Algorithms
Fixed- Parameter Evolu2onary Algorithms Frank Neumann School of Computer Science University of Adelaide Joint work with Stefan Kratsch (U Utrecht), Per Kris2an Lehre (DTU Informa2cs), Pietro S. Oliveto
More informationAn applica)on of Markov Chains: PageRank. Finding relevant informa)on on the Web
An applica)on of Markov Chains: PageRank Finding relevant informa)on on the Web Please Par)cipate h>p://www.st.ewi.tudelc.nl/~marco/lectures.html How much do you know about PageRank? 1) Nothing. 2) I
More informationInformation Networks: PageRank
Information Networks: PageRank Web Science (VU) (706.716) Elisabeth Lex ISDS, TU Graz June 18, 2018 Elisabeth Lex (ISDS, TU Graz) Links June 18, 2018 1 / 38 Repetition Information Networks Shape of the
More informationEfficient query processing
Efficient query processing Efficient scoring, distributed query processing Web Search 1 Ranking functions In general, document scoring functions are of the form The BM25 function, is one of the best performing:
More informationLecture 8: Linkage algorithms and web search
Lecture 8: Linkage algorithms and web search Information Retrieval Computer Science Tripos Part II Ronan Cummins 1 Natural Language and Information Processing (NLIP) Group ronan.cummins@cl.cam.ac.uk 2017
More informationEE 701 ROBOT VISION. Segmentation
EE 701 ROBOT VISION Regions and Image Segmentation Histogram-based Segmentation Automatic Thresholding K-means Clustering Spatial Coherence Merging and Splitting Graph Theoretic Segmentation Region Growing
More informationCS395T Visual Recogni5on and Search. Gautam S. Muralidhar
CS395T Visual Recogni5on and Search Gautam S. Muralidhar Today s Theme Unsupervised discovery of images Main mo5va5on behind unsupervised discovery is that supervision is expensive Common tasks include
More informationCS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University
CS6200 Informa.on Retrieval David Smith College of Computer and Informa.on Science Northeastern University Course Goals To help you to understand search engines, evaluate and compare them, and
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationIntroduc)on to Probabilis)c Latent Seman)c Analysis. NYP Predic)ve Analy)cs Meetup June 10, 2010
Introduc)on to Probabilis)c Latent Seman)c Analysis NYP Predic)ve Analy)cs Meetup June 10, 2010 PLSA A type of latent variable model with observed count data and nominal latent variable(s). Despite the
More informationPagerank Scoring. Imagine a browser doing a random walk on web pages:
Ranking Sec. 21.2 Pagerank Scoring Imagine a browser doing a random walk on web pages: Start at a random page At each step, go out of the current page along one of the links on that page, equiprobably
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,
More informationWeb- Scale Mul,media: Op,mizing LSH. Malcolm Slaney Yury Li<shits Junfeng He Y! Research
Web- Scale Mul,media: Op,mizing LSH Malcolm Slaney Yury Li
More informationV1.0: Seth Gilbert, V1.1: Steven Halim August 30, Abstract. d(e), and we assume that the distance function is non-negative (i.e., d(x, y) 0).
CS4234: Optimisation Algorithms Lecture 4 TRAVELLING-SALESMAN-PROBLEM (4 variants) V1.0: Seth Gilbert, V1.1: Steven Halim August 30, 2016 Abstract The goal of the TRAVELLING-SALESMAN-PROBLEM is to find
More informationTheorem 2.9: nearest addition algorithm
There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used
More informationExtending Heuris.c Search
Extending Heuris.c Search Talk at Hebrew University, Cri.cal MAS group Roni Stern Department of Informa.on System Engineering, Ben Gurion University, Israel 1 Heuris.c search 2 Outline Combining lookahead
More informationInforma(on Retrieval
Introduc*on to Informa(on Retrieval CS276 Informa*on Retrieval and Web Search Chris Manning and Pandu Nayak Link analysis Today s lecture hypertext and links We look beyond the content of documents We
More informationToday s lecture hypertext and links
Today s lecture hypertext and links Introduc*on to Informa(on Retrieval CS276 Informa*on Retrieval and Web Search Chris Manning and Pandu Nayak Link analysis We look beyond the content of documents We
More informationSocial Network Analysis
Social Network Analysis Mathematics of Networks Manar Mohaisen Department of EEC Engineering Adjacency matrix Network types Edge list Adjacency list Graph representation 2 Adjacency matrix Adjacency matrix
More informationDynamic programming. Trivial problems are solved first More complex solutions are composed from the simpler solutions already computed
Dynamic programming Solves a complex problem by breaking it down into subproblems Each subproblem is broken down recursively until a trivial problem is reached Computation itself is not recursive: problems
More informationSocial-Network Graphs
Social-Network Graphs Mining Social Networks Facebook, Google+, Twitter Email Networks, Collaboration Networks Identify communities Similar to clustering Communities usually overlap Identify similarities
More informationLecture 2 Data Cube Basics
CompSci 590.6 Understanding Data: Theory and Applica>ons Lecture 2 Data Cube Basics Instructor: Sudeepa Roy Email: sudeepa@cs.duke.edu 1 Today s Papers 1. Gray- Chaudhuri- Bosworth- Layman- Reichart- Venkatrao-
More informationCS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS
CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS Overview of Networks Instructor: Yizhou Sun yzsun@cs.ucla.edu January 10, 2017 Overview of Information Network Analysis Network Representation Network
More informationElements of Graph Theory
Elements of Graph Theory Quick review of Chapters 9.1 9.5, 9.7 (studied in Mt1348/2008) = all basic concepts must be known New topics we will mostly skip shortest paths (Chapter 9.6), as that was covered
More informationL3 Network Algorithms
L3 Network Algorithms NGEN06(TEK230) Algorithms in Geographical Information Systems by: Irene Rangel, updated Nov. 2015 by Abdulghani Hasan, Nov 2017 by Per-Ola Olsson Content 1. General issues of networks
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank
More informationIntroduction to Graph Theory
Introduction to Graph Theory Tandy Warnow January 20, 2017 Graphs Tandy Warnow Graphs A graph G = (V, E) is an object that contains a vertex set V and an edge set E. We also write V (G) to denote the vertex
More informationOnline algorithms for clustering problems
University of Szeged Department of Computer Algorithms and Artificial Intelligence Online algorithms for clustering problems Summary of the Ph.D. thesis by Gabriella Divéki Supervisor Dr. Csanád Imreh
More informationPart 1: Link Analysis & Page Rank
Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Graph Data: Social Networks [Source: 4-degrees of separation, Backstrom-Boldi-Rosa-Ugander-Vigna,
More informationAr#ficial Intelligence
Ar#ficial Intelligence Advanced Searching Prof Alexiei Dingli Gene#c Algorithms Charles Darwin Genetic Algorithms are good at taking large, potentially huge search spaces and navigating them, looking for
More informationAuthoritative K-Means for Clustering of Web Search Results
Authoritative K-Means for Clustering of Web Search Results Gaojie He Master in Information Systems Submission date: June 2010 Supervisor: Kjetil Nørvåg, IDI Co-supervisor: Robert Neumayer, IDI Norwegian
More informationThe PageRank Citation Ranking
October 17, 2012 Main Idea - Page Rank web page is important if it points to by other important web pages. *Note the recursive definition IR - course web page, Brian home page, Emily home page, Steven
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationInformation Retrieval. Lecture 11 - Link analysis
Information Retrieval Lecture 11 - Link analysis Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 35 Introduction Link analysis: using hyperlinks
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More information(Refer Slide Time: 05:25)
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering IIT Delhi Lecture 30 Applications of DFS in Directed Graphs Today we are going to look at more applications
More informationGraphs: Introduction. Ali Shokoufandeh, Department of Computer Science, Drexel University
Graphs: Introduction Ali Shokoufandeh, Department of Computer Science, Drexel University Overview of this talk Introduction: Notations and Definitions Graphs and Modeling Algorithmic Graph Theory and Combinatorial
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning
More informationCS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp
CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as
More informationGraph Theory and Network Measurment
Graph Theory and Network Measurment Social and Economic Networks MohammadAmin Fazli Social and Economic Networks 1 ToC Network Representation Basic Graph Theory Definitions (SE) Network Statistics and
More informationSocial Networks 2015 Lecture 10: The structure of the web and link analysis
04198250 Social Networks 2015 Lecture 10: The structure of the web and link analysis The structure of the web Information networks Nodes: pieces of information Links: different relations between information
More informationOptimal tour along pubs in the UK
1 From Facebook Optimal tour along 24727 pubs in the UK Road distance (by google maps) see also http://www.math.uwaterloo.ca/tsp/pubs/index.html (part of TSP homepage http://www.math.uwaterloo.ca/tsp/
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris Manning at Stanford U.) The Web as a Directed Graph
More informationPage rank computation HPC course project a.y Compute efficient and scalable Pagerank
Page rank computation HPC course project a.y. 2012-13 Compute efficient and scalable Pagerank 1 PageRank PageRank is a link analysis algorithm, named after Brin & Page [1], and used by the Google Internet
More informationNear Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri
Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri Scene Completion Problem The Bare Data Approach High Dimensional Data Many real-world problems Web Search and Text Mining Billions
More informationOnline Graph Exploration
Distributed Computing Online Graph Exploration Semester thesis Simon Hungerbühler simonhu@ethz.ch Distributed Computing Group Computer Engineering and Networks Laboratory ETH Zürich Supervisors: Sebastian
More informationMondrian Mul+dimensional K Anonymity
Mondrian Mul+dimensional K Anonymity Kristen Lefevre, David J. DeWi
More informationGraph Theory and Applications
Graph Theory and Applications with Exercises and Problems Jean-Claude Fournier WILEY Table of Contents Introduction 17 Chapter 1. Basic Concepts 21 1.1 The origin of the graph concept 21 1.2 Definition
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.cs.toronto.edu/~rsalakhu/ Lecture 3 Parametric Distribu>ons We want model the probability
More informationGraph-Based Algorithms in NLP
Graph-based Representation Graph-based Algorithms in NLP Let G(V, E) be a weighted undirected graph V - set of nodes in the graph E - set of weighted edges Edge weights w(u, v) define a measure of pairwise
More informationTheory of Computing. Lecture 10 MAS 714 Hartmut Klauck
Theory of Computing Lecture 10 MAS 714 Hartmut Klauck Seven Bridges of Königsberg Can one take a walk that crosses each bridge exactly once? Seven Bridges of Königsberg Model as a graph Is there a path
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second
More informationString Vector based KNN for Text Categorization
458 String Vector based KNN for Text Categorization Taeho Jo Department of Computer and Information Communication Engineering Hongik University Sejong, South Korea tjo018@hongik.ac.kr Abstract This research
More informationV4 Matrix algorithms and graph partitioning
V4 Matrix algorithms and graph partitioning - Community detection - Simple modularity maximization - Spectral modularity maximization - Division into more than two groups - Other algorithms for community
More informationCENTRALITIES. Carlo PICCARDI. DEIB - Department of Electronics, Information and Bioengineering Politecnico di Milano, Italy
CENTRALITIES Carlo PICCARDI DEIB - Department of Electronics, Information and Bioengineering Politecnico di Milano, Italy email carlo.piccardi@polimi.it http://home.deib.polimi.it/piccardi Carlo Piccardi
More informationInformation Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group
Information Retrieval Lecture 4: Web Search Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk (Lecture Notes after Stephen Clark)
More informationToday s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan
Today s topic CS347 Clustering documents Lecture 8 May 7, 2001 Prabhakar Raghavan Why cluster documents Given a corpus, partition it into groups of related docs Recursively, can induce a tree of topics
More informationObjec&ves. Review. Directed Graphs: Strong Connec&vity Greedy Algorithms
Objec&ves Directed Graphs: Strong Connec&vity Greedy Algorithms Ø Interval Scheduling Feb 7, 2018 CSCI211 - Sprenkle 1 Review Compare and contrast directed and undirected graphs What is a topological ordering?
More informationCS6200 Informa.on Retrieval. David Smith College of Computer and Informa.on Science Northeastern University
CS6200 Informa.on Retrieval David Smith College of Computer and Informa.on Science Northeastern University Course Goals To help you to understand search engines, evaluate and compare them, and
More informationLecture 13 Segmentation and Scene Understanding Chris Choy, Ph.D. candidate Stanford Vision and Learning Lab (SVL)
Lecture 13 Segmentation and Scene Understanding Chris Choy, Ph.D. candidate Stanford Vision and Learning Lab (SVL) http://chrischoy.org Stanford CS231A 1 Understanding a Scene Objects Chairs, Cups, Tables,
More information