Vertex Nomina-on. Glen A. Coppersmith. Human Language Technology Center of Excellence Johns Hopkins University. improved fusion of content and context

Size: px
Start display at page:

Download "Vertex Nomina-on. Glen A. Coppersmith. Human Language Technology Center of Excellence Johns Hopkins University. improved fusion of content and context"

Transcription

1 Vertex Nomina-on improved fusion of content and context Glen A. Coppersmith Human Language Technology Center of Excellence Johns Hopkins University Presented at Interface Symposium: May 17, 2012

2 Vertex Nomina-on improved fusion of content and context Glen A. Coppersmith Human Language Technology Center of Excellence Johns Hopkins University Presented at Interface Symposium: May 17, 2012

3 Vertex Nomina-on improved fusion of content and context Human Language Content Communica/ons Graph Glen A. Coppersmith Human Language Technology Center of Excellence Johns Hopkins University Presented at Interface Symposium: May 17, 2012

4 Our data: Enron Corpus

5 Mo-va-on and Problem Statement We know the iden--es of a few fraudsters. We observe the content and the context of both fraudsters and non- fraudsters. We want to know the iden--es of more fraudsters. Inference Task Nominate persons likely to be fraudsters.

6 Outline Introduc-on Method Importance Sampling Evalua-on Analy-cs Content and Context Fusions Conclusions & Future Direc-ons

7 Outline Introduc-on Method Importance Sampling Evalua-on Analy-cs Content and Context Fusions Conclusions & Future Direc-ons

8 Mo-va-on and Problem Statement We know the iden--es of a few fraudsters. We observe the content and the context of both fraudsters and non- fraudsters. We want to know the iden--es of more fraudsters. Inference Task Nominate persons likely to be fraudsters.

9 Mo-va-on and Problem Statement We know the iden--es of a few fraudsters. [Graph a[ributed with human language]interes-ng and non- interes-ng. We want to know the iden--es of more fraudsters. Inference Task Nominate persons likely to be fraudsters.

10 With loss of generality Ne^lix [See Trevor s Keynote] Genomics [Half the talks I ve seen at IF12] Noted similari-es Recommender Systems We focus on those with a graph and those with human language

11 With loss of generality Ne^lix [See Trevor s Keynote] Genomics [Half the talks I ve seen at IF12] Noted similari-es Recommender Systems We focus on those with a graph and those with human language

12 What s already been done? Theory (Nam) Lee & Priebe 2011 (Dominic) Lee & Priebe (submi[ed 2012) Simula-ons and Experiments Marche[e, Priebe, and Coppersmith (2011) Coppersmith & Priebe (submi[ed 2011) (Content + Context) > (Content Context) Human Language Technology and Graph Theory Assump-ons are valid for the Enron data

13 What s already been done? (Content + Context) > (Content Context) Human Language Technology and Graph Theory Assump-ons are valid for the Enron data Importance Sampling provides sensible par--ons

14 Assump-ons The fraudsters talk to each other more than expected of a random pair of people. The fraudsters talk about different things than expected of a random pair of people.

15 Fusion of Disparate Informa-on Some signal from the communica-ons graph Some signal from the human language content How do you fuse them? A principled fusion would be nice A useful fusion is more important Robust to real world problems Scalable to real world applica-ons

16 Ques-ons for this talk How should we fuse? (both performance and scalability are important) What kind of HLTs should we use? (Do they do different things?)

17 Outline Introduc-on Method Importance Sampling Evalua-on Analy-cs Content and Context Fusions Conclusions & Future Direc-ons

18 Mathema-cal Model: κ graph V =n M =m M =m V\M =n- m p = [p 0,p 1 ] s = [s 0,s 1 ] p 0 = s 0 p 1 < s 1 p p V\M s M κ(n, p, m, m, s)

19 Mathema-cal Model: κ graph V =n M =m M =m V\M =n- m p = [p 0,p 1 ] s = [s 0,s 1 ] p 0 = s 0 p 1 < s 1 p κ(n, p, m, m, s) p s V\M M fraudsters innocents

20 Assump-ons The fraudsters talk to each other more than expected of a random pair of people. M is more dense than V\M The fraudsters talk about different things than expected of a random pair of people. p[p 0,p 1 ] and s[s 0,s 1 ] p 0 = s 0, p 1 < s 1 V\M M

21 Our data: Enron Corpus

22 Our data: Enron Corpus but where are the fraudsters, Glen!?

23 Importance Sampling Procedure Randomly par--on Enron into M and V\M Ques-on assump-ons Density M > Density V\M Topic Distribu-on M Topic Distribu-on V\M Discard par--ons that violate assump-ons. Collect 5000 par--ons. V\M M

24 Importance Sampling Hand- labels provided by Michael Berry, 2004

25 Importance Sampling Hand- labels provided by Michael Berry, 2004

26 Importance Sampling M j.lovarato@enron e.haedicke@enron k.lay@enron r.shapiro@enron a.zipper@enron b.rogers@enron V\M

27 Tes-ng Par--ons: Density ρ(m) = observed edges in M possible edges in M Δρ = ρ(m) ρ(v\m) The fraudsters talk to each other more than expected of a random pair of people. V\M M

28 Tes-ng Par--ons: Topic Distribu-on Topic(M) = Topic(V\M) = Δ Topic = The fraudsters talk about different things than expected of a random pair of people. V\M M

29 Importance Sampling M j.lovarato@enron e.haedicke@enron k.lay@enron r.shapiro@enron a.zipper@enron b.rogers@enron V\M

30 Method M =10 M =5 V\M =179 For each vertex (v) in V\M we calculate each analy-c. Rank ver-ces according to each analy-c or fusion. Evaluate quality of ranked lists. M M V\M

31 One experiment M j.lovarato@enron e.haedicke@enron k.lay@enron r.shapiro@enron a.zipper@enron b.rogers@enron V\M

32 One experiment M j.lovarato@enron e.haedicke@enron k.lay@enron r.shapiro@enron a.zipper@enron b.rogers@enron V\M

33 Evalua-on Ranked lists evaluated by standard Informa-on Retrieval measures What is our inference task? Need to find all of them Mean Average Precision (MAP) Need to find one more of them Mean Reciprocal Rank (MRR) Can only examine k ver-ces (p@k), k = {5,10}

34 Outline Introduc-on Method Importance Sampling Evalua-on Analy-cs Content and Context Fusions Conclusions & Future Direc-ons

35 Context Analy-c The fraudsters talk to each other more than expected of a random pair of people. Number of known fraudsters in 1- hop neighborhood of candidate vertex. M j.lovarato@enron e.haedicke@enron k.lay@enron

36 Content Analy-cs (HLTs) The fraudsters talk about different things than expected of a random pair of people. How similar is the content of each candidate vertex to the known fraudsters?

37 Training HLTs M j.lovarato@enron e.haedicke@enron k.lay@enron r.shapiro@enron a.zipper@enron b.rogers@enron V\M

38 Training HLTs M j.lovarato@enron e.haedicke@enron k.lay@enron r.shapiro@enron a.zipper@enron b.rogers@enron V\M

39 Training HLTs M e.haedicke@enron j.lovarato@enron Interes-ng k.lay@enron r.shapiro@enron a.zipper@enron b.rogers@enron V\M

40 Training HLTs M e.haedicke@enron j.lovarato@enron Interes-ng k.lay@enron r.shapiro@enron a.zipper@enron b.rogers@enron V\M

41 Training HLTs M e.haedicke@enron j.lovarato@enron Interes-ng k.lay@enron r.shapiro@enron Uninteres-ng a.zipper@enron b.rogers@enron V\M

42 HLT 1 : Average Word Count Histogram What propor-on of the document d i is made up of word w j? Each d i represented as probability vector x i. x = W, W word types in the corpus. Vector I is average of all interes-ng ( ) x i. Interes-ng

43 HLT 1 : Average Word Count Histogram Score each d i by 1- JS( x i, I ) 0.83 (High) Interes-ng 0.23 (Low)

44 HLT 1 : Average Word Count Histogram Score all d i for each vertex: (k.lay@enron) Average scores Interes-ng M j.lovarato@enron e.haedicke@enron k.lay@enron

45 HLT 1 : Average Word Count Histogram Score all d i for each vertex: (k.lay@enron) Average scores Interes-ng M j.lovarato@enron e.haedicke@enron k.lay@enron

46 HLT 1 : Average Word Count Histogram Score all d i for each vertex: (k.lay@enron) Average scores Interes-ng M j.lovarato@enron e.haedicke@enron k.lay@enron

47 HLT 1 : Average Word Count Histogram Score all d i for each vertex: (k.lay@enron) Average scores Interes-ng M j.lovarato@enron e.haedicke@enron k.lay@enron

48 HLT 1 : Average Word Count Histogram Score all d i for each vertex: (k.lay@enron) Average scores 0.72 Interes-ng M j.lovarato@enron e.haedicke@enron k.lay@enron

49 HLT 2 : Closest Word Count Histogram Score all d i for each vertex: (k.lay@enron) Score only by closest document 0.72 Interes-ng M j.lovarato@enron e.haedicke@enron k.lay@enron

50 HLT 3 : Compression Language Modeling How well does a given message compress? Repeated sequences compress well Novel sequences do not compress well

51 HLT 3 : Compression Language Modeling Discrimina-ve Model which model fits be[er? Interes-ng Red Uninteres-ng

52 HLT 3 : Compression Language Modeling M e.haedicke@enron j.lovarato@enron Interes-ng k.lay@enron r.shapiro@enron Uninteres-ng a.zipper@enron b.rogers@enron V\M

53 HLT 4 : Topic Modeling Latent Dirichlet Alloca-on (ala Blei, Wallach, McCallum, Mimno,...) [we use mallet] Each document is a mixture of topics Each topic is a probability distribu-on over W mallet is available from h[p://mallet.cs.umass.edu/

54 HLT 4 : Topic Modeling j.lovarato@enron e.haedicke@enron k.lay@enron r.shapiro@enron a.zipper@enron b.rogers@enron

55 HLT 4 : Topic Modeling j.lovarato@enron e.haedicke@enron k.lay@enron r.shapiro@enron a.zipper@enron b.rogers@enron

56 HLT 4 : Topic Modeling M j.lovarato@enron e.haedicke@enron k.lay@enron r.shapiro@enron a.zipper@enron b.rogers@enron V\M

57 HLT 4 : Topic Modeling M j.lovarato@enron e.haedicke@enron k.lay@enron r.shapiro@enron a.zipper@enron b.rogers@enron V\M

58 HLT 4 : Topic Modeling M j.lovarato@enron e.haedicke@enron k.lay@enron r.shapiro@enron a.zipper@enron b.rogers@enron V\M

59 HLT 4 : Topic Modeling Score all d i for each vertex: (k.lay@enron) Sum the weight given to topics of interest M j.lovarato@enron e.haedicke@enron k.lay@enron

60 HLT 4 : Topic Modeling Score all d i for each vertex: (k.lay@enron) Sum the weight given to topics of interest M j.lovarato@enron e.haedicke@enron k.lay@enron

61 HLT 4 : Topic Modeling Score all d i for each vertex: (k.lay@enron) Sum the weight given to topics of interest M j.lovarato@enron e.haedicke@enron k.lay@enron

62 HLT 4 : Topic Modeling Score all d i for each vertex: (k.lay@enron) Sum the weight given to topics of interest 0.12 M j.lovarato@enron e.haedicke@enron k.lay@enron

63 Individual Analy-c Performance MAP Context Average WCH Minimum WCH Compression Topic Modeling

64 Individual Analy-c Performance MAP * Context Average WCH Minimum WCH Compression Topic Modeling

65 Outline Introduc-on Method Importance Sampling Evalua-on Analy-cs Content and Context Fusions Conclusions & Future Direc-ons

66 Linear Fusion F = γ HLT 1 + (1- γ) Context 2 Analy-cs

67 Linear fusion 2 Analy-cs Performance MAP Context γ 1 Content

68 Linear fusion 2 Analy-cs Performance MAP Grid search over Content and Context Context γ 1 Content

69 Linear Fusion F = γ HLT 1 + (1- γ) Context 2 Analy-cs F = Σ i (γ i HLT i ) + (1- Σ i γ i ) Context Arbitrary number of Analy-cs NB: Scores must be calibrated.

70 MAP MRR Gridsearch Linear Combina-ons MAP

71 Rank Fusion Fuse ranks instead of scores. Rank ver-ces by each analy-c. Each vertex represented by vector of ranks Fusion score is a func-on of that vector Min() One measure can damn you Max() One measure can save you Median() Something in between

72 Rank Fusion MAP min median max min median max 3 Analy-cs 5 Analy-cs

73 Rank Fusion MAP min median max min median max 3 Analy-cs 5 Analy-cs

74 Rank Fusion MAP min median max min median max 3 Analy-cs 5 Analy-cs

75 Individual Rank Fusions Gridsearch Linear Combina-ons Overall Comparisons MAP

76 Individual Rank Fusions Gridsearch Linear Combina-ons Overall Comparisons MAP

77 Individual Rank Fusions Gridsearch Linear Combina-ons O(n) in number of Analy-cs MAP

78 Individual Rank Fusions Gridsearch Linear Combina-ons O(x n ) in number of Analy-cs!! MAP

79 Individual Rank Fusions Gridsearch Linear Combina-ons Nevermind O(.) for a moment MAP ~11 minutes ~13 days

80 Individual Rank Fusions Gridsearch Linear Combina-ons Nevermind O(.) for a moment MAP ~11 minutes ~13 days

81 Linear fusion 3 Analy-cs

82 Which HLTs should I use? Average WCH + Minimum WCH + Topic Modeling 0.1 Other HLTs Compression Context

83 Which HLTs should I use? Need to find them all Need to find only one more 0.1 Other HLTs Compression Can only look at 5 or 10 Context

84 Outline Introduc-on Method Importance Sampling Evalua-on Analy-cs Content and Context Fusions Conclusions & Future Direc-ons

85 Ques-ons How should we fuse? For small number of analy-cs grid search linear For large number of analy-cs rank fuse (or think harder) What kind of HLTs should we use? Depends on your inference task, of course. We can actually recommend what to use! Insight into rela-ve strengths of HLTs exposed.

86 Future and Related Work This is post- hoc fusion of scores or ranks, what about more na-ve fusion? Different data and inference tasks What movie should I watch? (Ne^lix) Who should I cite? (ACL) Who should I work with? (Github) Vote predic-on (Congressional Bills)

87 Collaborators Carey Priebe [JHU HLTCOE] Allen Gorin [JHU HLTCOE] Richard Cox [JHU HLTCOE] David Marche[e [Navy] Yongser Park [JHU CIS] Minh Tang [JHU AMS] Michael Decerbo [Raytheon BBN] Hanna Wallach [UMass Amherst] Jim Mayfield [JHU APL/HLTCOE] Paul McNamee [JHU APL/HLTCOE] William Szewczyk [DoD]

88 Collaborators Carey Priebe [JHU HLTCOE] Allen Gorin [JHU HLTCOE] Richard Cox [JHU HLTCOE] David Marche[e [Navy] Yongser Park [JHU CIS] Minh Tang [JHU AMS] Michael Decerbo [Raytheon BBN] Hanna Wallach [UMass Amherst] Jim Mayfield [JHU APL/HLTCOE] Paul McNamee [JHU APL/HLTCOE] William Szewczyk [DoD]

89 Thank you. Coppersmith & Priebe (submi[ed): [arxiv.org/ ] Latest papers (oƒen) available from glencoppersmith.com

90

91 Content of Importance Sample California Power.. Portland...San Diego..... WBI California Power.. Portland.. Portland..... bcf.. WBI Energy Services billions of cubic feet California Power.. San Diego..... WBI...bcf....

92 Content of Importance Sample Pro Football.. Raiders... touchdown.....implode 9/11.. 9/11.. New York.... tragedy

93 Context of Importance Sample

94 Context of Importance Sample Observed s 1 = 0.45

95 Context of Importance Sample Observed p 1 = 0.13

96 Context of Importance Sample Observed p 1 = 0.12

97 Context of Importance Sample Observed p 1 = 0.13 Observed s 1 = 0.45 Observed p 1 = 0.12

98 Content of Importance Sample Portland...San Diego..... WBI k.lay@enron.. Portland.. Portland..... bcf.. WBI Energy Services billions of cubic feet e.haedicke@enron San Diego..... WBI...bcf.... j.lovarato@enron

99 Comparing HLT Methods MRR X X N X X X X X + + N X N N N N N X X N + + N N + X + X N N N + X N N N N N N N N N N + + N X + + N N+ N N X N N N N N N N N N N N N N N N+ N S 1

100 Importance Sampled Joint Δ Topic Δ ρ

101 Injec-on Importance P P 1 P S S 1 S 1

Vertex Nomina-on. Glen A. Coppersmith. Human Language Technology Center of Excellence Johns Hopkins University. fusion of content and context

Vertex Nomina-on. Glen A. Coppersmith. Human Language Technology Center of Excellence Johns Hopkins University. fusion of content and context Vertex Nomina-on fusion of content and context Glen A. Coppersmith Human Language Technology Center of Excellence Johns Hopkins University Presented at MIT GraphEx: April 16, 2013 Vertex Nomina-on fusion

More information

Fusion of Content and Context in Human Language Technology

Fusion of Content and Context in Human Language Technology Fusion of Content and Context in Human Language Technology Allen Gorin Human Language Technology Research Na4onal Security Agency Fort Meade, Maryland Graph Exploitation Workshop, August 2011 Report Documentation

More information

Anomaly Detection using Adaptive Fusion of Graph Features on a Time Series of Graphs

Anomaly Detection using Adaptive Fusion of Graph Features on a Time Series of Graphs Anomaly Detection using Adaptive Fusion of Graph Features on a Time Series of Graphs Youngser Park Carey E. Priebe Abdou Youssef Abstract It is known that fusion of information from graph features, compared

More information

Geometric verifica-on of matching

Geometric verifica-on of matching Geometric verifica-on of matching The correspondence problem The correspondence problem tries to figure out which parts of an image correspond to which parts of another image, a;er the camera has moved,

More information

3D shapes types and properties

3D shapes types and properties 3D shapes types and properties 1 How do 3D shapes differ from 2D shapes? Imagine you re giving an explana on to a younger child. What would you say and/or draw? Remember the surfaces of a 3D shape are

More information

The Open University s repository of research publications and other research outputs. Search Personalization with Embeddings

The Open University s repository of research publications and other research outputs. Search Personalization with Embeddings Open Research Online The Open University s repository of research publications and other research outputs Search Personalization with Embeddings Conference Item How to cite: Vu, Thanh; Nguyen, Dat Quoc;

More information

CSE 788 Mathema-cal and Algorithmic Founda-ons for Data Visualiza-on. Winter 2011 Han- Wei Shen

CSE 788 Mathema-cal and Algorithmic Founda-ons for Data Visualiza-on. Winter 2011 Han- Wei Shen CSE 788 Mathema-cal and Algorithmic Founda-ons for Data Visualiza-on Winter 2011 Han- Wei Shen Class Objec-ves Give you an overview of data visualiza-on research Focus more on scien-fic data Overview the

More information

Computing Similarity between Cultural Heritage Items using Multimodal Features

Computing Similarity between Cultural Heritage Items using Multimodal Features Computing Similarity between Cultural Heritage Items using Multimodal Features Nikolaos Aletras and Mark Stevenson Department of Computer Science, University of Sheffield Could the combination of textual

More information

Wide-area Dissemina-on under Strict Timeliness, Reliability, and Cost Constraints

Wide-area Dissemina-on under Strict Timeliness, Reliability, and Cost Constraints Wide-area Dissemina-on under Strict Timeliness, Reliability, and Cost Constraints Amy Babay, Emily Wagner, Yasamin Nazari, Michael Dinitz, and Yair Amir Distributed Systems and Networks Lab www.dsn.jhu.edu

More information

Unsupervised Rank Aggregation with Distance-Based Models

Unsupervised Rank Aggregation with Distance-Based Models Unsupervised Rank Aggregation with Distance-Based Models Alexandre Klementiev, Dan Roth, and Kevin Small University of Illinois at Urbana-Champaign Motivation Consider a panel of judges Each (independently)

More information

A Formal Approach to Score Normalization for Meta-search

A Formal Approach to Score Normalization for Meta-search A Formal Approach to Score Normalization for Meta-search R. Manmatha and H. Sever Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherst, MA 01003

More information

Web Services in Ac-on. Mark Schroeder 2E Track

Web Services in Ac-on. Mark Schroeder 2E Track Web Services in Ac-on Mark Schroeder 2E Track FOR INFORMATION PURPOSES ONLY Terms of this presenta3on This presenta-on was based on current informa-on and resource alloca-ons as of April 2013 and is subject

More information

Inference and Representation

Inference and Representation Inference and Representation Rachel Hodos New York University Lecture 5, October 6, 2015 Rachel Hodos Lecture 5: Inference and Representation Today: Learning with hidden variables Outline: Unsupervised

More information

So#ware Specifica-ons. David Duncan March 23, 2012

So#ware Specifica-ons. David Duncan March 23, 2012 So#ware Specifica-ons David Duncan March 23, 2012 Execu-ve Summary Crea-ng a so#ware specifica-on is tradi-onally done via the So#ware Requirements Specifica-on (SRS) but there has been a growing movement

More information

Key-Value Stores: RiakKV

Key-Value Stores: RiakKV B4M36DS2, BE4M36DS2: Database Systems 2 h p://www.ksi.m.cuni.cz/~svoboda/courses/181-b4m36ds2/ Lecture 7 Key-Value Stores: RiakKV Mar n Svoboda mar n.svoboda@fel.cvut.cz 12. 11. 2018 Charles University,

More information

Privacy Breaches in Privacy-Preserving Data Mining

Privacy Breaches in Privacy-Preserving Data Mining 1 Privacy Breaches in Privacy-Preserving Data Mining Johannes Gehrke Department of Computer Science Cornell University Joint work with Sasha Evfimievski (Cornell), Ramakrishnan Srikant (IBM), and Rakesh

More information

Homework 1. Automa-c Test Genera-on. Automated Test Genera-on Idea: The Problem. Pre- & Post- Condi-ons 2/8/17. Due Thursday (Feb 16, 9 AM)

Homework 1. Automa-c Test Genera-on. Automated Test Genera-on Idea: The Problem. Pre- & Post- Condi-ons 2/8/17. Due Thursday (Feb 16, 9 AM) Homework 1 Due Thursday (Feb 16, 9 AM) Automa-c Test Genera-on Ques-ons? Automated Test Genera-on Idea: Automa-cally generate tests for sokware Why? Find bugs more quickly Conserve resources No need to

More information

NDBI040: Big Data Management and NoSQL Databases

NDBI040: Big Data Management and NoSQL Databases NDBI040: Big Data Management and NoSQL Databases h p://www.ksi.mff.cuni.cz/~svoboda/courses/171-ndbi040/ Prac cal Class 8 MongoDB Mar n Svoboda svoboda@ksi.mff.cuni.cz 5. 12. 2017 Charles University in

More information

OPTIONAL EXERCISE 1: CREATING A FUSION PROJECT PART A

OPTIONAL EXERCISE 1: CREATING A FUSION PROJECT PART A Exercise Objec ves In the previous exercises, you were provided a full Fusion LIDAR dataset. In this exercise, you will begin with raw LIDAR data and create a new Fusion project one that will be as complete

More information

Key-Value Stores: RiakKV

Key-Value Stores: RiakKV B4M36DS2: Database Systems 2 h p://www.ksi.mff.cuni.cz/ svoboda/courses/2016-1-b4m36ds2/ Lecture 4 Key-Value Stores: RiakKV Mar n Svoboda svoboda@ksi.mff.cuni.cz 24. 10. 2016 Charles University in Prague,

More information

FTTH-GPON OLT Emulator with integrated Network Analyser

FTTH-GPON OLT Emulator with integrated Network Analyser By WYZARTEL FTTH-GPON OLT Emulator with integrated Network Analyser Features OLT Emula on Emulates OLT func onality, allowing to build specific provi sioning models and configure OMCI enes individually

More information

Harnessing Publicly Available Factual Data in the Analytical Process

Harnessing Publicly Available Factual Data in the Analytical Process June 14, 2012 Harnessing Publicly Available Factual Data in the Analytical Process by Benson Margulies, CTO We put the World in the World Wide Web ABOUT BASIS TECHNOLOGY Basis Technology provides so ware

More information

Key-Value Stores: RiakKV

Key-Value Stores: RiakKV NDBI040: Big Data Management and NoSQL Databases h p://www.ksi.mff.cuni.cz/ svoboda/courses/2016-1-ndbi040/ Lecture 4 Key-Value Stores: RiakKV Mar n Svoboda svoboda@ksi.mff.cuni.cz 25. 10. 2016 Charles

More information

Exploiting Conversation Structure in Unsupervised Topic Segmentation for s

Exploiting Conversation Structure in Unsupervised Topic Segmentation for  s Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails Shafiq Joty, Giuseppe Carenini, Gabriel Murray, Raymond Ng University of British Columbia Vancouver, Canada EMNLP 2010 1

More information

Alexandre Barreto ITA/GMU. Paulo C.G. Costa GMU. Edgar Yano ITA

Alexandre Barreto ITA/GMU. Paulo C.G. Costa GMU. Edgar Yano ITA Alexandre Barreto ITA/GMU Paulo C.G. Costa GMU Edgar Yano ITA 1 failure in electric power system The New World Increasing automa on of processes and systems that are part of cri cal infrastructures. Society

More information

Tutorial on gene-c ancestry es-ma-on: How to use LASER. Chaolong Wang Sequence Analysis Workshop June University of Michigan

Tutorial on gene-c ancestry es-ma-on: How to use LASER. Chaolong Wang Sequence Analysis Workshop June University of Michigan Tutorial on gene-c ancestry es-ma-on: How to use LASER Chaolong Wang Sequence Analysis Workshop June 2014 @ University of Michigan LASER: Loca-ng Ancestry from SEquence Reads Main func:ons of the so

More information

CS47300: Web Information Search and Management

CS47300: Web Information Search and Management CS47300: Web Information Search and Management Prof. Chris Clifton 27 August 2018 Material adapted from course created by Dr. Luo Si, now leading Alibaba research group 1 AD-hoc IR: Basic Process Information

More information

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity

More information

Comparing Chord, CAN, and Pastry Overlay Networks for Resistance to DoS Attacks

Comparing Chord, CAN, and Pastry Overlay Networks for Resistance to DoS Attacks Comparing Chord, CAN, and Pastry Overlay Networks for Resistance to DoS Attacks Hakem Beitollahi Hakem.Beitollahi@esat.kuleuven.be Geert Deconinck Geert.Deconinck@esat.kuleuven.be Katholieke Universiteit

More information

Interactive Text Mining with Iterative Denoising

Interactive Text Mining with Iterative Denoising Interactive Text Mining with Iterative Denoising, PhD kegiles@vcu.edu www.people.vcu.edu/~kegiles Assistant Professor Department of Statistics and Operations Research Virginia Commonwealth University Interactive

More information

Modern Retrieval Evaluations. Hongning Wang

Modern Retrieval Evaluations. Hongning Wang Modern Retrieval Evaluations Hongning Wang CS@UVa What we have known about IR evaluations Three key elements for IR evaluation A document collection A test suite of information needs A set of relevance

More information

Random Graphs Based on Self-Exciting Messaging Activities

Random Graphs Based on Self-Exciting Messaging Activities Random Graphs Based on Self-Exciting Messaging Activities Nam H. Lee Tim S.T. Leung Carey E. Priebe October 31, 2011 Abstract This paper studies a problem of identifying an inhomogeneous interaction structure

More information

1. Assess the CATIA tools, workbenches, func ons, and opera ons used to create and edit 3D part models, products, and drawings.

1. Assess the CATIA tools, workbenches, func ons, and opera ons used to create and edit 3D part models, products, and drawings. 1 of 5 5/1/2014 2:38 PM MFGT 142 CATIA II Approval Date: 02/20/2014 Effec ve Term: Fall 2014 Department: MANUFACTURING TECHNOLOGY Division: Career Technical Educa on Units: 3.00 Grading Op on: Le er Grade

More information

B4M36DS2, BE4M36DS2: Database Systems 2

B4M36DS2, BE4M36DS2: Database Systems 2 B4M36DS2, BE4M36DS2: Database Systems 2 h p://www.ksi.mff.cuni.cz/~svoboda/courses/171-b4m36ds2/ Lecture 2 Data Formats Mar n Svoboda mar n.svoboda@fel.cvut.cz 9. 10. 2017 Charles University in Prague,

More information

Delip Rao. Aug Aug (IIT) Madras MS, Computer Science. CGPA 9.5/10 Thesis: Learning from Heterogenous Knowledge Sources

Delip Rao. Aug Aug (IIT) Madras MS, Computer Science. CGPA 9.5/10 Thesis: Learning from Heterogenous Knowledge Sources Contact Information 2 Over Ridge Court Apt 3731 Baltimore, MD 21210 Delip Rao (415) 975-1810 deliprao@gmail.com @deliprao http://www.cs.jhu.edu/~delip Education Johns Hopkins University Sept. 2006 present

More information

Patterns and Computer Vision

Patterns and Computer Vision Patterns and Computer Vision Michael Anderson, Bryan Catanzaro, Jike Chong, Katya Gonina, Kurt Keutzer, Tim Mattson, Mark Murphy, David Sheffield, Bor-Yiing Su, Narayanan Sundaram and the rest of the PALLAS

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser

More information

NDBI040: Big Data Management and NoSQL Databases. h p:// svoboda/courses/ ndbi040/

NDBI040: Big Data Management and NoSQL Databases. h p://  svoboda/courses/ ndbi040/ NDBI040: Big Data Management and NoSQL Databases h p://www.ksi.mff.cuni.cz/ svoboda/courses/2016-1-ndbi040/ Prac cal Class 2 Riak Key-Value Store Mar n Svoboda svoboda@ksi.mff.cuni.cz 25. 10. 2016 Charles

More information

h p://

h p:// B4M36DS2, BE4M36DS2: Database Systems 2 h p://www.ksi.mff.cuni.cz/~svoboda/courses/171-b4m36ds2/ Prac cal Class 7 Cassandra Mar n Svoboda mar n.svoboda@fel.cvut.cz 27. 11. 2017 Charles University in Prague,

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank

More information

Wireless Sensor Networks CS742

Wireless Sensor Networks CS742 Wireless Sensor Networks CS742 Outline Overview Environment Monitoring Medical application Data-dissemination schemes Media access control schemes Distributed algorithms for collaborative processing Architecture

More information

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #7: En-ty/Rela-onal Model---Part 3

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #7: En-ty/Rela-onal Model---Part 3 CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #7: En-ty/Rela-onal Model---Part 3 Purpose of E/R Model The E/R model allows us to sketch the design of a database informally.

More information

ShortMAC: Efficient Data-plane Fault Localization. Xin Zhang, Zongwei Zhou, Hsu- Chun Hsiao, Tiffany Hyun- Jin Kim Adrian Perrig and Patrick Tague

ShortMAC: Efficient Data-plane Fault Localization. Xin Zhang, Zongwei Zhou, Hsu- Chun Hsiao, Tiffany Hyun- Jin Kim Adrian Perrig and Patrick Tague ShortMAC: Efficient Data-plane Fault Localization Xin Zhang, Zongwei Zhou, Hsu- Chun Hsiao, Tiffany Hyun- Jin Kim Adrian Perrig and Patrick Tague What is Fault LocalizaDon? Problem defini-on Iden-fy faulty

More information

Centrality Book. cohesion.

Centrality Book. cohesion. Cohesion The graph-theoretic terms discussed in the previous chapter have very specific and concrete meanings which are highly shared across the field of graph theory and other fields like social network

More information

CS558 Programming Languages

CS558 Programming Languages CS558 Programming Languages Winter 2018 Lecture 7b Andrew Tolmach Portland State University 1994-2018 Dynamic Type Checking Static type checking offers the great advantage of catching errors early And

More information

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015 University of Virginia Department of Computer Science CS 4501: Information Retrieval Fall 2015 5:00pm-6:15pm, Monday, October 26th Name: ComputingID: This is a closed book and closed notes exam. No electronic

More information

An Active Service for Multicast Video Distribution. tion. 1 Introduction

An Active Service for Multicast Video Distribution. tion. 1 Introduction Video Distribu An Active Service for Multicast Video Distribu Paulo André da Silva Gonç alves, José Ferreira de Rezende Grupo de Teleinformática e Automaç ão - GTA Programa de Engenharia Elétrica COPPE

More information

The Watchful Forensic Analyst: Multi-Clue Information Fusion with Background Knowledge

The Watchful Forensic Analyst: Multi-Clue Information Fusion with Background Knowledge WIFS 13 The Watchful Forensic Analyst: Multi-Clue Information Fusion with Background Knowledge Marco Fontani #, Enrique Argones-Rúa*, Carmela Troncoso*, Mauro Barni # # University of Siena (IT) * GRADIANT:

More information

NEIGHBOR SELECTION FOR VARIANCE REDUCTION IN I DDQ and OTHER PARAMETRIC DAT A

NEIGHBOR SELECTION FOR VARIANCE REDUCTION IN I DDQ and OTHER PARAMETRIC DAT A NEIGHBOR SELECTION FOR VARIANCE REDUCTION IN I DDQ and OTHER PARAMETRIC DAT A W. Robert Daasch, Kevin Cota and James McNames IC Design and Test Laboratory Electrical and Computer Engineering, Portland

More information

THE KNOWLEDGE MANAGEMENT STRATEGY IN ORGANIZATIONS. Summer semester, 2016/2017

THE KNOWLEDGE MANAGEMENT STRATEGY IN ORGANIZATIONS. Summer semester, 2016/2017 THE KNOWLEDGE MANAGEMENT STRATEGY IN ORGANIZATIONS Summer semester, 2016/2017 SOCIAL NETWORK ANALYSIS: THEORY AND APPLICATIONS 1. A FEW THINGS ABOUT NETWORKS NETWORKS IN THE REAL WORLD There are four categories

More information

Diffusion and Clustering on Large Graphs

Diffusion and Clustering on Large Graphs Diffusion and Clustering on Large Graphs Alexander Tsiatas Final Defense 17 May 2012 Introduction Graphs are omnipresent in the real world both natural and man-made Examples of large graphs: The World

More information

DFM Concurrent Costing

DFM Concurrent Costing Die Casting Analysis Start a new analysis For the purposes of this tutorial we will es mate the cost per part of manufacturing 200,000 of the disk drive casings shown here: front The material is to be

More information

Learning the Structure of Deep, Sparse Graphical Models

Learning the Structure of Deep, Sparse Graphical Models Learning the Structure of Deep, Sparse Graphical Models Hanna M. Wallach University of Massachusetts Amherst wallach@cs.umass.edu Joint work with Ryan Prescott Adams & Zoubin Ghahramani Deep Belief Networks

More information

AMPS Snapshot: User Registra on External Users

AMPS Snapshot: User Registra on External Users Do You Need an AMPS Account? How to Prepare for AMPS Account Registra on Not an employee of DLA or DFAS? If you cannot authen cate your iden ty with a smart card, you can s ll obtain an AMPS account to

More information

electronic license applications user s guide Contents What you need Page 1 Get started Page 3 Paper Non-Resident Licensing Page 10

electronic license applications user s guide Contents What you need Page 1 Get started Page 3 Paper Non-Resident Licensing Page 10 applications Contents What you need Page 1 Get started Page 3 Paper Non-Resident Licensing Page 10 Welcome to the Na onal Insurance Producer Registry s applications The give producers the ability to quickly

More information

Facebook Adver-sing Strategies for

Facebook Adver-sing Strategies for Facebook Adver-sing Strategies for News Professionals PRESENTED BY: TOYA WILSON-SMITH, M.B.A. NEXGENERATION DIGITAL MARKETING AGENCY INC. FACEBOOK ADVERTISING STRATEGIES THE 7 STEPS TO BUILDING YOUR EMAIL

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford

More information

Clustering part II 1

Clustering part II 1 Clustering part II 1 Clustering What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods Hierarchical Methods 2 Partitioning Algorithms:

More information

Fast Contextual Preference Scoring of Database Tuples

Fast Contextual Preference Scoring of Database Tuples Fast Contextual Preference Scoring of Database Tuples Kostas Stefanidis Department of Computer Science, University of Ioannina, Greece Joint work with Evaggelia Pitoura http://dmod.cs.uoi.gr 2 Motivation

More information

Column-Family Stores: Cassandra

Column-Family Stores: Cassandra NDBI040: Big Data Management and NoSQL Databases h p://www.ksi.mff.cuni.cz/ svoboda/courses/2016-1-ndbi040/ Lecture 10 Column-Family Stores: Cassandra Mar n Svoboda svoboda@ksi.mff.cuni.cz 13. 12. 2016

More information

Evaluating an Associative Browsing Model for Personal Information

Evaluating an Associative Browsing Model for Personal Information Evaluating an Associative Browsing Model for Personal Information Jinyoung Kim, W. Bruce Croft, David A. Smith and Anton Bakalov Department of Computer Science University of Massachusetts Amherst {jykim,croft,dasmith,abakalov}@cs.umass.edu

More information

Where Should the Bugs Be Fixed?

Where Should the Bugs Be Fixed? Where Should the Bugs Be Fixed? More Accurate Information Retrieval-Based Bug Localization Based on Bug Reports Presented by: Chandani Shrestha For CS 6704 class About the Paper and the Authors Publication

More information

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. + What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

A Genetic Algorithm Applied to Graph Problems Involving Subsets of Vertices

A Genetic Algorithm Applied to Graph Problems Involving Subsets of Vertices A Genetic Algorithm Applied to Graph Problems Involving Subsets of Vertices Yaser Alkhalifah Roger L. Wainwright Department of Mathematical Department of Mathematical and Computer Sciences and Computer

More information

Indoor WiFi Localization with a Dense Fingerprint Model

Indoor WiFi Localization with a Dense Fingerprint Model Indoor WiFi Localization with a Dense Fingerprint Model Plamen Levchev, Chaoran Yu, Michael Krishnan, Avideh Zakhor Department of Electrical Engineering and Computer Sciences, University of California,

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

PIPELINE Program Competency Model for Informa on Technology So ware Tes ng and Quality Assurance Career Cluster Pathway

PIPELINE Program Competency Model for Informa on Technology So ware Tes ng and Quality Assurance Career Cluster Pathway PIPELINE Program Competency Model for Informa on Technology So ware Tes ng Quality Assurance Career Cluster Pathway Employer-Specific Requirements Occupation-Specific Competencies* Recognize So ware Func

More information

Chapter 5snow year.notebook March 15, 2018

Chapter 5snow year.notebook March 15, 2018 Chapter 5: Statistical Reasoning Section 5.1 Exploring Data Measures of central tendency (Mean, Median and Mode) attempt to describe a set of data by identifying the central position within a set of data

More information

MapReduce, Apache Hadoop

MapReduce, Apache Hadoop B4M36DS2, BE4M36DS2: Database Systems 2 h p://www.ksi.mff.cuni.cz/~svoboda/courses/171-b4m36ds2/ Lecture 5 MapReduce, Apache Hadoop Mar n Svoboda mar n.svoboda@fel.cvut.cz 30. 10. 2017 Charles University

More information

Microso 埘 Exam Dumps PDF for Guaranteed Success

Microso 埘 Exam Dumps PDF for Guaranteed Success Microso 埘 70 698 Exam Dumps PDF for Guaranteed Success The PDF version is simply a copy of a Portable Document of your Microso 埘 70 698 ques ons and answers product. The Microso 埘 Cer fied Solu on Associa

More information

Query-based Text Normalization Selection Models for Enhanced Retrieval Accuracy

Query-based Text Normalization Selection Models for Enhanced Retrieval Accuracy Query-based Text Normalization Selection Models for Enhanced Retrieval Accuracy Si-Chi Chin Rhonda DeCook W. Nick Street David Eichmann The University of Iowa Iowa City, USA. {si-chi-chin, rhonda-decook,

More information

Towards Asynchronous Distributed MCMC Inference for Large Graphical Models

Towards Asynchronous Distributed MCMC Inference for Large Graphical Models Towards Asynchronous Distributed MCMC for Large Graphical Models Sameer Singh Department of Computer Science University of Massachusetts Amherst, MA 13 sameer@cs.umass.edu Andrew McCallum Department of

More information

Series. Teacher. Geometry

Series. Teacher. Geometry Series G Teacher Copyright 2009 3P Learning. All rights reserved. First edition printed 2009 in Australia. A catalogue record for this book is available from 3P Learning Ltd. ISBN 978-1-921861-21-5 Ownership

More information

Cross-Instance Tuning of Unsupervised Document Clustering Algorithms

Cross-Instance Tuning of Unsupervised Document Clustering Algorithms Cross-Instance Tuning of Unsupervised Document Clustering Algorithms Damianos Karakos, Jason Eisner, and Sanjeev Khudanpur Center for Language and Speech Processing Johns Hopkins University Carey E. Priebe

More information

Improving 3-D 3 D processing: an efficient data structure and scale selection. Jean-François Lalonde Vision and Mobile Robotics Laboratory

Improving 3-D 3 D processing: an efficient data structure and scale selection. Jean-François Lalonde Vision and Mobile Robotics Laboratory Improving 3-D 3 D processing: an efficient data structure and scale selection Jean-François Lalonde Vision and Mobile Robotics Laboratory Goal Improve 3-D signal processing techniques Speed of execution

More information

Filling the Gaps. Improving Wikipedia Stubs

Filling the Gaps. Improving Wikipedia Stubs Filling the Gaps Improving Wikipedia Stubs Siddhartha Banerjee and Prasenjit Mitra, The Pennsylvania State University, University Park, PA, USA Qatar Computing Research Institute, Doha, Qatar 15th ACM

More information

Kapitel 4: Clustering

Kapitel 4: Clustering Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.

More information

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

h p://

h p:// B4M36DS2, BE4M36DS2: Database Systems 2 h p://www.ksi.mff.cuni.cz/~svoboda/courses/171-b4m36ds2/ Lecture 1 Introduc on Mar n Svoboda mar n.svoboda@fel.cvut.cz 2. 10. 2017 Charles University in Prague,

More information

Statistical network modeling: challenges and perspectives

Statistical network modeling: challenges and perspectives Statistical network modeling: challenges and perspectives Harry Crane Department of Statistics Rutgers August 1, 2017 Harry Crane (Rutgers) Network modeling JSM IOL 2017 1 / 25 Statistical network modeling:

More information

CPR: Composable Performance Regression for Scalable Multiprocessor Models

CPR: Composable Performance Regression for Scalable Multiprocessor Models CPR: Composable Performance Regression for Scalable Models Benjamin C. Lee Computer Architecture Group Microsoft Research Jamison Collins, Hong Wang Microarchitecture Research Lab Intel Corporation David

More information

Streamlining Your Work with Macros. Workshop Manual

Streamlining Your Work with Macros. Workshop Manual Microsoft Excel 2010 401 Advanced Workshop Streamlining Your Work with Macros Workshop Manual Presented by David Newbold, Jennifer Tran and Katie Spencer 06/23/11 1 Excel 401 Macro Exercise Workbook Class

More information

STM. Computing. Specifica on Topics. High Level Skills you should think about to take your work to the next level:

STM. Computing. Specifica on Topics. High Level Skills you should think about to take your work to the next level: Specifica on Topics High Level Skills you should think about to take your work to the next level: Discussing the advantages and disadvantages of the different topology types Describing the key fields in

More information

Clustering using Topic Models

Clustering using Topic Models Clustering using Topic Models Compiled by Sujatha Das, Cornelia Caragea Credits for slides: Blei, Allan, Arms, Manning, Rai, Lund, Noble, Page. Clustering Partition unlabeled examples into disjoint subsets

More information

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah Human detection using histogram of oriented gradients Srikumar Ramalingam School of Computing University of Utah Reference Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection,

More information

A Java Based Component Identification Tool for Measuring Circuit Protections. James D. Parham J. Todd McDonald Michael R. Grimaila Yong C.

A Java Based Component Identification Tool for Measuring Circuit Protections. James D. Parham J. Todd McDonald Michael R. Grimaila Yong C. A Java Based Component Identification Tool for Measuring Circuit Protections James D. Parham J. Todd McDonald Michael R. Grimaila Yong C. Kim 1 Background Program Protection Software (programs) are the

More information

Networking for Wide Format Printers

Networking for Wide Format Printers Networking for Wide Format Printers Table of Contents Configure PC before RIP Installa on... 1 Verifying Your Network Se ngs for Mac Communica on... 3 Changing Your Network Adapter for Mac Communica on...

More information

Supplementary material to Epidemic spreading on complex networks with community structures

Supplementary material to Epidemic spreading on complex networks with community structures Supplementary material to Epidemic spreading on complex networks with community structures Clara Stegehuis, Remco van der Hofstad, Johan S. H. van Leeuwaarden Supplementary otes Supplementary ote etwork

More information

Clustering in Data Mining

Clustering in Data Mining Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,

More information

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS

CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS CS249: SPECIAL TOPICS MINING INFORMATION/SOCIAL NETWORKS Overview of Networks Instructor: Yizhou Sun yzsun@cs.ucla.edu January 10, 2017 Overview of Information Network Analysis Network Representation Network

More information

In Workflow. Viewing: Last edit: 11/04/14 4:01 pm. Approval Path. Programs referencing this course. Submi er: Proposing College/School: Department:

In Workflow. Viewing: Last edit: 11/04/14 4:01 pm. Approval Path. Programs referencing this course. Submi er: Proposing College/School: Department: 1 of 5 1/6/2015 1:20 PM Date Submi ed: 11/04/14 4:01 pm Viewing: Last edit: 11/04/14 4:01 pm Changes proposed by: SIMSLUA In Workflow 1. INSY Editor 2. INSY Chair 3. EN Undergraduate Curriculum Commi ee

More information

Data Polygamy: The Many-Many Relationships among Urban Spatio-Temporal Data Sets. Fernando Chirigati Harish Doraiswamy Theodoros Damoulas

Data Polygamy: The Many-Many Relationships among Urban Spatio-Temporal Data Sets. Fernando Chirigati Harish Doraiswamy Theodoros Damoulas Data Polygamy: The Many-Many Relationships among Urban Spatio-Temporal Data Sets Fernando Chirigati Harish Doraiswamy Theodoros Damoulas Juliana Freire New York University New York University University

More information

Introduction to Parallel & Distributed Computing Parallel Graph Algorithms

Introduction to Parallel & Distributed Computing Parallel Graph Algorithms Introduction to Parallel & Distributed Computing Parallel Graph Algorithms Lecture 16, Spring 2014 Instructor: 罗国杰 gluo@pku.edu.cn In This Lecture Parallel formulations of some important and fundamental

More information

Fractional Similarity : Cross-lingual Feature Selection for Search

Fractional Similarity : Cross-lingual Feature Selection for Search : Cross-lingual Feature Selection for Search Jagadeesh Jagarlamudi University of Maryland, College Park, USA Joint work with Paul N. Bennett Microsoft Research, Redmond, USA Using All the Data Existing

More information

Approximately Uniform Random Sampling in Sensor Networks

Approximately Uniform Random Sampling in Sensor Networks Approximately Uniform Random Sampling in Sensor Networks Boulat A. Bash, John W. Byers and Jeffrey Considine Motivation Data aggregation Approximations to COUNT, SUM, AVG, MEDIAN Existing work does not

More information

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*

Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Tracking Trends: Incorporating Term Volume into Temporal Topic Models Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Dept. of Computer Science and Engineering, Lehigh University, Bethlehem, PA,

More information

VMWare. Inc. 발표자 : 박찬호. Memory Resource Management in VMWare ESX Server

VMWare. Inc. 발표자 : 박찬호. Memory Resource Management in VMWare ESX Server Author : Carl A. Waldspurger VMWare. Inc. 발표자 : 박찬호 Memory Resource Management in VMWare ESX Server Contents Background Motivation i and Overview Memory Virtualization Reclamation Mechanism Sharing Memory

More information