Approximate Integration of Streaming data

Size: px
Start display at page:

Download "Approximate Integration of Streaming data"

Transcription

1 Approximate Integration of Streaming data Michel de Rougemont, Guillaume Vimont University Paris II & Irif

2 Plan 1. Approximation for Datawarehouses: Boolean queries Analytic queries 2. Streaming Datawarehouses Reservoir sampling Community in Graphs via Uniform sampling A good approximation for some random graphs 3. Data Integration for streams Compress streams with «good representations» with h.p. Define the Integration from this compressed forms The «value» of the data increases with the integration

3 1. OLAP Queries for a Datawarehouse OLAP queries (Analytic queries): filter, dimensions, measure, aggregation Dimension: Channel Measure: Sentiment Analysis Aggregation: Sum Datawarehouse Tweets with Sentiment Analysis (measure in [0 9])

4 Approximation of OLAP queries PBS 200 2/5 CNN 300 3/5 PBS 225 CNN 275 Result is a distribution μ Distance between two distributions: L 1 [μ-μ ]=0.1 i.e. 10%

5 Uniform samples, Weighted samples Take N uniform samples: Prob Ω [[μ-μ ]<ε ] > 1-δ N=O(log(1/ δ ).c/ε 2 ) Prob Ω [[μ-μ ]<0.1 ] > 0.9

6 Approximation of Queries Assume a large G and a small random subgraph G Approximate Q on G by Q on G : 1. Approximate an OLAP query Q (distribution): Prob Ω [ [Q G - Q G ] < ε ] > 1-δ 1. Approximate a graph property Q(): Prob Ω [ Q() G = Q () G ] > 1-δ 1. Approximate a graph search property Q(x). Prob Ω [ Q(x) G Q (x) G Φ ] > 1-δ

7 Approximate Q on G by Q on G : Refinement of Approximation Strict Approximation a graph property Q(): Prob Ω [ Q() G = Q () G ] > 1-δ Property Tester (ε,δ)-approximation: If Q is true on G then Q is true on G If Q is ε-far from G then : Prob Ω [ Q () G false ] > 1-δ Size of G only depends on (ε,δ) Note: Q is ε-far from G if dist(q,g)>ε

8 2. Streaming data Assume a stream of tuples of G: t 1, t 2, t n. Can t store all the tuples: keep a subset G of size N Tool: Reservoir sampling to approximate an OLAP query Q. Reservoir Sampling: keeps k edges with a uniform distribution with a weighted distribution Theorem 1: If N>O(log(1/ δ ).c/ε 2 ), then Prob Ω [ [Q G - Q G ] < ε ] > 1-δ G G

9 Streaming Graph edges Assume a stream of edges of G: e 1, e 2, e n. Can t store all the edges: keep a subgraph G and the «most recent» G t G t G G t Tools: Reservoir for G and Window Reservoir G t Store: all the nodes (Mysql) and only edges in the Reservoirs Complexity : Avoid storing O(n 2 ) edges

10 Examples of Queries Assume a stream of Twitter edges of G: e 1, e 2, e n. Can t store all the edges: keep a subgraph G Boolean queries: Q():- is there one community (dense subgraph)? Q():- are there k disjoint communities? Search queries: Q(x):- x is in a community (largest dense subgraph (V,E) s.t [E]>α. [V] 2 ) Analytic queries: Q :- the distribution of the sizes of the communities

11 Streaming Graph edges Assume a stream of edges of G: e 1, e 2, e n. Can t store all the edges: keep a subgraph G and the «most recent» G t G t G G t Search for a community Q(x) Algorithm: keep large connected components in G : Q (x) G Goal: Prob Ω [ Q(x) G Q (x) G Φ ] > 1-δ

12 Nodes Uniform Sampling: nodes vs. Edges Edges Edges witness the concentration. Equivalently, Nodes could be chosen with their degree distribution.

13 Random graphs Erdos-Renyi: G(n,p) Preferential Attachment: PA(n,m) Degree distribution: Power law: Prob Ω [ degree(x)=i ] =c/i 2 Example: [15,6,4,3,3,2] as histogram Concentration property: O( m/2) nodes of high degrees S concentrated if v in S then Majority (edges(v)) in S S is a community. We can build a graph with several communities of different sizes.

14 Graph with 2 communities of the same size

15 Uniform Sampling on the edges G λ G t G G t Algorithm: Keep large connected components of G t at times λ, 2λ,.. Theorem 2: If a graph G follows a power law and has «a concentration property», then Prob Ω [ Q(x) G Q (x) G Φ ] > 1-δ C 2,i C 1,i

16 Several streams: S 1, S 2, 3. Integration of Streaming data S j : we compress the stream to V, C j the union of the large connected components C j =Union C i,j All the nodes are stored in a Mysql database

17 2 streams

18 Integration of Streaming data How can we correlate two streams of edges? Store large connected components of the reservoir, at discrete times, for each stream. Let V C1, V C2 the nodes of the communities. Nodes correlation (t): (V 1 V 2 )/Max(V 1,V 2 ) Communities correlation (t): (V C1 V C2 )/Max(V C1,V C2 ) t Série 1 Série 2

19 Conclusion 1. Approximation of queries Boolean, search, analytic queries 2. Streaming Data Streaming tuples Streaming edges: dynamic community detection 3. Data Integration for streams Compress streams with «good representations» with h.p. Define the Integration from this compressed forms The «value» of the data increases with the integration

Course : Data mining

Course : Data mining Course : Data mining Lecture : Mining data streams Aristides Gionis Department of Computer Science Aalto University visiting in Sapienza University of Rome fall 2016 reading assignment LRU book: chapter

More information

Sublinear Algorithms for Big Data Analysis

Sublinear Algorithms for Big Data Analysis Sublinear Algorithms for Big Data Analysis Michael Kapralov Theory of Computation Lab 4 EPFL 7 September 2017 The age of big data: massive amounts of data collected in various areas of science and technology

More information

MapReduce Algorithms. Barna Saha. March 28, 2016

MapReduce Algorithms. Barna Saha. March 28, 2016 MapReduce Algorithms Barna Saha March 28, 2016 Complexity Model for MapReduce Minimum Spanning Tree in MapReduce Computing Dense Subgraph in MapReduce Complexity Model for MapReduce:MRC i Input: finite

More information

The 2-core of a Non-homogeneous Hypergraph

The 2-core of a Non-homogeneous Hypergraph July 16, 2012 k-cores A Hypergraph G on vertex set V is a collection E of subsets of V. E is the set of hyperedges. For ordinary graphs, e = 2 for all e E. The k-core of a (hyper)graph is the maximal subgraph

More information

The 4/3 Additive Spanner Exponent is Tight

The 4/3 Additive Spanner Exponent is Tight The 4/3 Additive Spanner Exponent is Tight Amir Abboud and Greg Bodwin Stanford University June 8, 2016 Our Question How much can you compress a graph into low space while still being able to approximately

More information

Lecture 15. Lecture 15: Bitmap Indexes

Lecture 15. Lecture 15: Bitmap Indexes Lecture 5 Lecture 5: Bitmap Indexes Lecture 5 What you will learn about in this section. Bitmap Indexes 2. Storing a bitmap index 3. Bitslice Indexes 2 Lecture 5. Bitmap indexes 3 Motivation Consider the

More information

Managing Uncertainty in Data Streams. Aleka Seliniotaki Project Presentation HY561 Heraklion, 22/05/2013

Managing Uncertainty in Data Streams. Aleka Seliniotaki Project Presentation HY561 Heraklion, 22/05/2013 Managing Uncertainty in Data Streams Aleka Seliniotaki Project Presentation HY561 Heraklion, 22/05/2013 Introduction Uncertain Data Streams T V Data: incomplete, imprecise, misleading Results: unknown

More information

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C. D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the path meet either

More information

Rectangle-Efficient Aggregation in Spatial Data Streams

Rectangle-Efficient Aggregation in Spatial Data Streams Rectangle-Efficient Aggregation in Spatial Data Streams Srikanta Tirthapura Iowa State David Woodruff IBM Almaden The Data Stream Model Stream S of additive updates (i, Δ) to an underlying vector v: v

More information

An Efficient Transformation for Klee s Measure Problem in the Streaming Model

An Efficient Transformation for Klee s Measure Problem in the Streaming Model An Efficient Transformation for Klee s Measure Problem in the Streaming Model Gokarna Sharma, Costas Busch, Ramachandran Vaidyanathan, Suresh Rai, and Jerry Trahan Louisiana State University Outline of

More information

simply ordered sets. We ll state only the result here, since the proof is given in Munkres.

simply ordered sets. We ll state only the result here, since the proof is given in Munkres. p. 1 Math 490 Notes 20 More About Compactness Recall that in Munkres it is proved that a simply (totally) ordered set X with the order topology is connected iff it satisfies: (1) Every subset bounded above

More information

Distances in power-law random graphs

Distances in power-law random graphs Distances in power-law random graphs Sander Dommers Supervisor: Remco van der Hofstad February 2, 2009 Where innovation starts Introduction There are many complex real-world networks, e.g. Social networks

More information

Sketching Asynchronous Streams Over a Sliding Window

Sketching Asynchronous Streams Over a Sliding Window Sketching Asynchronous Streams Over a Sliding Window Srikanta Tirthapura (Iowa State University) Bojian Xu (Iowa State University) Costas Busch (Rensselaer Polytechnic Institute) 1/32 Data Stream Processing

More information

[Ch 6] Set Theory. 1. Basic Concepts and Definitions. 400 lecture note #4. 1) Basics

[Ch 6] Set Theory. 1. Basic Concepts and Definitions. 400 lecture note #4. 1) Basics 400 lecture note #4 [Ch 6] Set Theory 1. Basic Concepts and Definitions 1) Basics Element: ; A is a set consisting of elements x which is in a/another set S such that P(x) is true. Empty set: notated {

More information

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

More information

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

More information

CHAPTER 3 FUZZY RELATION and COMPOSITION

CHAPTER 3 FUZZY RELATION and COMPOSITION CHAPTER 3 FUZZY RELATION and COMPOSITION Crisp relation! Definition (Product set) Let A and B be two non-empty sets, the prod uct set or Cartesian product A B is defined as follows, A B = {(a, b) a A,

More information

Solutions for the Exam 6 January 2014

Solutions for the Exam 6 January 2014 Mastermath and LNMB Course: Discrete Optimization Solutions for the Exam 6 January 2014 Utrecht University, Educatorium, 13:30 16:30 The examination lasts 3 hours. Grading will be done before January 20,

More information

Locality-Sensitive Codes from Shift-Invariant Kernels Maxim Raginsky (Duke) and Svetlana Lazebnik (UNC)

Locality-Sensitive Codes from Shift-Invariant Kernels Maxim Raginsky (Duke) and Svetlana Lazebnik (UNC) Locality-Sensitive Codes from Shift-Invariant Kernels Maxim Raginsky (Duke) and Svetlana Lazebnik (UNC) Goal We want to design a binary encoding of data such that similar data points (similarity measures

More information

Implementation of Relational Operations

Implementation of Relational Operations Implementation of Relational Operations Module 4, Lecture 1 Database Management Systems, R. Ramakrishnan 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset of rows

More information

GRAPH THEORY and APPLICATIONS. Factorization Domination Indepence Clique

GRAPH THEORY and APPLICATIONS. Factorization Domination Indepence Clique GRAPH THEORY and APPLICATIONS Factorization Domination Indepence Clique Factorization Factor A factor of a graph G is a spanning subgraph of G, not necessarily connected. G is the sum of factors G i, if:

More information

Evaluation of Relational Operations. Relational Operations

Evaluation of Relational Operations. Relational Operations Evaluation of Relational Operations Chapter 14, Part A (Joins) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Relational Operations v We will consider how to implement: Selection ( )

More information

INF580 Advanced Mathematical Programming

INF580 Advanced Mathematical Programming INF580 Advanced Mathematical Programming TD3 Complexity and MP Leo Liberti CNRS LIX, Ecole Polytechnique, France 190125 Leo Liberti (CNRS LIX) INF580 / TD3 190125 1 / 9 Simple AMPL codes Write AMPL code

More information

arxiv: v1 [math.co] 18 Jun 2009

arxiv: v1 [math.co] 18 Jun 2009 Decompositions into subgraphs of small diameter Jacob Fox Benny Sudakov arxiv:0906.3530v1 [math.co] 18 Jun 009 Abstract We investigate decompositions of a graph into a small number of low diameter subgraphs.

More information

Combinatorial Optimization of Group Key Management

Combinatorial Optimization of Group Key Management Combinatorial Optimization of Group Key Management M. Eltoweissy, James Madison U., M. H. Heydari, James Madison U., Linda Morales, Texas A&M Commerce, & Hal Sudborough, U. Texas Dallas, Why is key maintenance

More information

Sketching Probabilistic Data Streams

Sketching Probabilistic Data Streams Sketching Probabilistic Data Streams Graham Cormode AT&T Labs - Research graham@research.att.com Minos Garofalakis Yahoo! Research minos@acm.org Challenge of Uncertain Data Many applications generate data

More information

Probabilistic Graph Summarization

Probabilistic Graph Summarization Probabilistic Graph Summarization Nasrin Hassanlou, Maryam Shoaran, and Alex Thomo University of Victoria, Victoria, Canada {hassanlou,maryam,thomo}@cs.uvic.ca 1 Abstract We study group-summarization of

More information

A fast-growing subset of Av(1324)

A fast-growing subset of Av(1324) A fast-growing subset of Av(1324) David Bevan Permutation Patterns 2014, East Tennessee State University 7 th July 2014 Permutations Permutation of length n: an ordering on 1,..., n. Example σ = 31567482

More information

Introduction III. Graphs. Motivations I. Introduction IV

Introduction III. Graphs. Motivations I. Introduction IV Introduction I Graphs Computer Science & Engineering 235: Discrete Mathematics Christopher M. Bourke cbourke@cse.unl.edu Graph theory was introduced in the 18th century by Leonhard Euler via the Königsberg

More information

Chapter 23. Minimum Spanning Trees

Chapter 23. Minimum Spanning Trees Chapter 23. Minimum Spanning Trees We are given a connected, weighted, undirected graph G = (V,E;w), where each edge (u,v) E has a non-negative weight (often called length) w(u,v). The Minimum Spanning

More information

Lecture and notes by: Nate Chenette, Brent Myers, Hari Prasad November 8, Property Testing

Lecture and notes by: Nate Chenette, Brent Myers, Hari Prasad November 8, Property Testing Property Testing 1 Introduction Broadly, property testing is the study of the following class of problems: Given the ability to perform (local) queries concerning a particular object (e.g., a function,

More information

Analyzing a Greedy Approximation of an MDL Summarization

Analyzing a Greedy Approximation of an MDL Summarization Analyzing a Greedy Approximation of an MDL Summarization Peter Fontana fontanap@seas.upenn.edu Faculty Advisor: Dr. Sudipto Guha April 10, 2007 Abstract Many OLAP (On-line Analytical Processing) applications

More information

Part 2.2 Continuous functions and their properties v1 2018

Part 2.2 Continuous functions and their properties v1 2018 Part 2.2 Continuous functions and their properties v 208 Intermediate Values Recall R is complete. This means that ever non-empt subset of R which is bounded above has a least upper bound. That is: (A

More information

CSC Discrete Math I, Spring Sets

CSC Discrete Math I, Spring Sets CSC 125 - Discrete Math I, Spring 2017 Sets Sets A set is well-defined, unordered collection of objects The objects in a set are called the elements, or members, of the set A set is said to contain its

More information

Distribution-Free Models of Social and Information Networks

Distribution-Free Models of Social and Information Networks Distribution-Free Models of Social and Information Networks Tim Roughgarden (Stanford CS) joint work with Jacob Fox (Stanford Math), Rishi Gupta (Stanford CS), C. Seshadhri (UC Santa Cruz), Fan Wei (Stanford

More information

Networks in economics and finance. Lecture 1 - Measuring networks

Networks in economics and finance. Lecture 1 - Measuring networks Networks in economics and finance Lecture 1 - Measuring networks What are networks and why study them? A network is a set of items (nodes) connected by edges or links. Units (nodes) Individuals Firms Banks

More information

arxiv: v2 [cs.ds] 30 Sep 2016

arxiv: v2 [cs.ds] 30 Sep 2016 Synergistic Sorting, MultiSelection and Deferred Data Structures on MultiSets Jérémy Barbay 1, Carlos Ochoa 1, and Srinivasa Rao Satti 2 1 Departamento de Ciencias de la Computación, Universidad de Chile,

More information

Discrete Mathematics Lecture 4. Harper Langston New York University

Discrete Mathematics Lecture 4. Harper Langston New York University Discrete Mathematics Lecture 4 Harper Langston New York University Sequences Sequence is a set of (usually infinite number of) ordered elements: a 1, a 2,, a n, Each individual element a k is called a

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 22, 2011 Raquel Urtasun and Tamir Hazan (TTI-C) Graphical Models April 22, 2011 1 / 22 If the graph is non-chordal, then

More information

Image Enhancement: To improve the quality of images

Image Enhancement: To improve the quality of images Image Enhancement: To improve the quality of images Examples: Noise reduction (to improve SNR or subjective quality) Change contrast, brightness, color etc. Image smoothing Image sharpening Modify image

More information

The Relational Algebra

The Relational Algebra The Relational Algebra Relational Algebra Relational algebra is the basic set of operations for the relational model These operations enable a user to specify basic retrieval requests (or queries) 27-Jan-14

More information

Query Evaluation and Optimization

Query Evaluation and Optimization Query Evaluation and Optimization Jan Chomicki University at Buffalo Jan Chomicki () Query Evaluation and Optimization 1 / 21 Evaluating σ E (R) Jan Chomicki () Query Evaluation and Optimization 2 / 21

More information

Random Simplicial Complexes

Random Simplicial Complexes Random Simplicial Complexes Duke University CAT-School 2015 Oxford 9/9/2015 Part II Random Geometric Complexes Contents Probabilistic Ingredients Random Geometric Graphs Definitions Random Geometric Complexes

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank

More information

Social and Technological Network Data Analytics. Lecture 5: Structure of the Web, Search and Power Laws. Prof Cecilia Mascolo

Social and Technological Network Data Analytics. Lecture 5: Structure of the Web, Search and Power Laws. Prof Cecilia Mascolo Social and Technological Network Data Analytics Lecture 5: Structure of the Web, Search and Power Laws Prof Cecilia Mascolo In This Lecture We describe power law networks and their properties and show

More information

Optimal Routing and Scheduling in Multihop Wireless Renewable Energy Networks

Optimal Routing and Scheduling in Multihop Wireless Renewable Energy Networks Optimal Routing and Scheduling in Multihop Wireless Renewable Energy Networks ITA 11, San Diego CA, February 2011 MHR. Khouzani, Saswati Sarkar, Koushik Kar UPenn, UPenn, RPI March 23, 2011 Khouzani, Sarkar,

More information

CSCI5070 Advanced Topics in Social Computing

CSCI5070 Advanced Topics in Social Computing CSCI5070 Advanced Topics in Social Computing Irwin King The Chinese University of Hong Kong king@cse.cuhk.edu.hk!! 2012 All Rights Reserved. Outline Graphs Origins Definition Spectral Properties Type of

More information

Overview of Clustering

Overview of Clustering based on Loïc Cerfs slides (UFMG) April 2017 UCBL LIRIS DM2L Example of applicative problem Student profiles Given the marks received by students for different courses, how to group the students so that

More information

Druid Power Interactive Applications at Scale. Jonathan Wei Software Engineer

Druid Power Interactive Applications at Scale. Jonathan Wei Software Engineer Druid Power Interactive Applications at Scale Jonathan Wei Software Engineer History & Motivation Demo Overview Storage Internals Druid Architecture Motivation Motivation Visibility and analysis for complex

More information

B561 Advanced Database Concepts. 6. Streaming Algorithms. Qin Zhang 1-1

B561 Advanced Database Concepts. 6. Streaming Algorithms. Qin Zhang 1-1 B561 Advanced Database Concepts 6. Streaming Algorithms Qin Zhang 1-1 The model and challenge The data stream model (Alon, Matias and Szegedy 1996) a n a 2 a 1 RAM CPU Why hard? Cannot store everything.

More information

Lecture 8: Jointly distributed random variables

Lecture 8: Jointly distributed random variables Lecture : Jointly distributed random variables Random Vectors and Joint Probability Distributions Definition: Random Vector. An n-dimensional random vector, denoted as Z = (Z, Z,, Z n ), is a function

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Queries on streams

More information

AMS /672: Graph Theory Homework Problems - Week V. Problems to be handed in on Wednesday, March 2: 6, 8, 9, 11, 12.

AMS /672: Graph Theory Homework Problems - Week V. Problems to be handed in on Wednesday, March 2: 6, 8, 9, 11, 12. AMS 550.47/67: Graph Theory Homework Problems - Week V Problems to be handed in on Wednesday, March : 6, 8, 9,,.. Assignment Problem. Suppose we have a set {J, J,..., J r } of r jobs to be filled by a

More information

University of Ostrava. Fuzzy Transform of a Function on the Basis of Triangulation

University of Ostrava. Fuzzy Transform of a Function on the Basis of Triangulation University of Ostrava Institute for Research and Applications of Fuzzy Modeling Fuzzy Transform of a Function on the Basis of Triangulation Dagmar Plšková Research report No. 83 2005 Submitted/to appear:

More information

Coloring 3-Colorable Graphs

Coloring 3-Colorable Graphs Coloring -Colorable Graphs Charles Jin April, 015 1 Introduction Graph coloring in general is an etremely easy-to-understand yet powerful tool. It has wide-ranging applications from register allocation

More information

New Directions in Traffic Measurement and Accounting. Need for traffic measurement. Relation to stream databases. Internet backbone monitoring

New Directions in Traffic Measurement and Accounting. Need for traffic measurement. Relation to stream databases. Internet backbone monitoring New Directions in Traffic Measurement and Accounting C. Estan and G. Varghese Presented by Aaditeshwar Seth 1 Need for traffic measurement Internet backbone monitoring Short term Detect DoS attacks Long

More information

TOPOLOGY, DR. BLOCK, FALL 2015, NOTES, PART 3.

TOPOLOGY, DR. BLOCK, FALL 2015, NOTES, PART 3. TOPOLOGY, DR. BLOCK, FALL 2015, NOTES, PART 3. 301. Definition. Let m be a positive integer, and let X be a set. An m-tuple of elements of X is a function x : {1,..., m} X. We sometimes use x i instead

More information

Point-Set Topology II

Point-Set Topology II Point-Set Topology II Charles Staats September 14, 2010 1 More on Quotients Universal Property of Quotients. Let X be a topological space with equivalence relation. Suppose that f : X Y is continuous and

More information

Superlinear Lower Bounds for Multipass Graph Processing

Superlinear Lower Bounds for Multipass Graph Processing Krzysztof Onak Superlinear lower bounds for multipass graph processing p. 1/29 Superlinear Lower Bounds for Multipass Graph Processing Krzysztof Onak IBM T.J. Watson Research Center Joint work with Venkat

More information

MODELS OF CUBIC THEORIES

MODELS OF CUBIC THEORIES Bulletin of the Section of Logic Volume 43:1/2 (2014), pp. 19 34 Sergey Sudoplatov MODELS OF CUBIC THEORIES Abstract Cubic structures and cubic theories are defined on a base of multidimensional cubes.

More information

GEMINI GEneric Multimedia INdexIng

GEMINI GEneric Multimedia INdexIng GEMINI GEneric Multimedia INdexIng GEneric Multimedia INdexIng distance measure Sub-pattern Match quick and dirty test Lower bounding lemma 1-D Time Sequences Color histograms Color auto-correlogram Shapes

More information

Outline. The History of Histograms. Yannis Ioannidis University of Athens, Hellas

Outline. The History of Histograms. Yannis Ioannidis University of Athens, Hellas The History of Histograms Yannis Ioannidis University of Athens, Hellas Outline Prehistory Definitions and Framework The Early Past 10 Years Ago The Recent Past Industry Competitors The Future Prehistory

More information

3 : Representation of Undirected GMs

3 : Representation of Undirected GMs 0-708: Probabilistic Graphical Models 0-708, Spring 202 3 : Representation of Undirected GMs Lecturer: Eric P. Xing Scribes: Nicole Rafidi, Kirstin Early Last Time In the last lecture, we discussed directed

More information

CS224W: Analysis of Networks Jure Leskovec, Stanford University

CS224W: Analysis of Networks Jure Leskovec, Stanford University CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 11/13/17 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 2 Observations Models

More information

Routing v.s. Spanners

Routing v.s. Spanners Routing v.s. Spanners Spanner et routage compact : similarités et différences Cyril Gavoille Université de Bordeaux AlgoTel 09 - Carry-Le-Rouet June 16-19, 2009 Outline Spanners Routing The Question and

More information

ETL TESTING TRAINING

ETL TESTING TRAINING ETL TESTING TRAINING Retrieving Data using the SQL SELECT Statement Capabilities of the SELECT statement Arithmetic expressions and NULL values in the SELECT statement Column aliases Use of concatenation

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Chapter 12, Part A Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset

More information

Point-Set Topology for Impossibility Results in Distributed Computing. Thomas Nowak

Point-Set Topology for Impossibility Results in Distributed Computing. Thomas Nowak Point-Set Topology for Impossibility Results in Distributed Computing Thomas Nowak Overview Introduction Safety vs. Liveness First Example: Wait-Free Shared Memory Message Omission Model Execution Trees

More information

Data Streams Algorithms

Data Streams Algorithms Data Streams Algorithms Phillip B. Gibbons Intel Research Pittsburgh Guest Lecture in 15-853 Algorithms in the Real World Phil Gibbons, 15-853, 12/4/06 # 1 Outline Data Streams in the Real World Formal

More information

Safely Measuring Tor. Rob Jansen U.S. Naval Research Laboratory Center for High Assurance Computer Systems

Safely Measuring Tor. Rob Jansen U.S. Naval Research Laboratory Center for High Assurance Computer Systems Safely Measuring Tor Safely Measuring Tor, Rob Jansen and Aaron Johnson, In the Proceedings of the 23rd ACM Conference on Computer and Communication Security (CCS 2016). Rob Jansen Center for High Assurance

More information

NP-complete Reductions

NP-complete Reductions NP-complete Reductions 1. Prove that 3SAT P DOUBLE-SAT, i.e., show DOUBLE-SAT is NP-complete by reduction from 3SAT. The 3-SAT problem consists of a conjunction of clauses over n Boolean variables, where

More information

Randomized rounding of semidefinite programs and primal-dual method for integer linear programming. Reza Moosavi Dr. Saeedeh Parsaeefard Dec.

Randomized rounding of semidefinite programs and primal-dual method for integer linear programming. Reza Moosavi Dr. Saeedeh Parsaeefard Dec. Randomized rounding of semidefinite programs and primal-dual method for integer linear programming Dr. Saeedeh Parsaeefard 1 2 3 4 Semidefinite Programming () 1 Integer Programming integer programming

More information

Agenda. Math Google PageRank algorithm. 2 Developing a formula for ranking web pages. 3 Interpretation. 4 Computing the score of each page

Agenda. Math Google PageRank algorithm. 2 Developing a formula for ranking web pages. 3 Interpretation. 4 Computing the score of each page Agenda Math 104 1 Google PageRank algorithm 2 Developing a formula for ranking web pages 3 Interpretation 4 Computing the score of each page Google: background Mid nineties: many search engines often times

More information

Combinatorial Geometry & Approximation Algorithms

Combinatorial Geometry & Approximation Algorithms Combinatorial Geometry & Approximation Algorithms Timothy Chan U. of Waterloo PROLOGUE Analysis of Approx Factor in Analysis of Runtime in Computational Geometry Combinatorial Geometry Problem 1: Geometric

More information

Methods for Intelligent Systems

Methods for Intelligent Systems Methods for Intelligent Systems Lecture Notes on Clustering (II) Davide Eynard eynard@elet.polimi.it Department of Electronics and Information Politecnico di Milano Davide Eynard - Lecture Notes on Clustering

More information

Dense triangle-free graphs are four-colorable: A solution to the Erdős-Simonovits problem.

Dense triangle-free graphs are four-colorable: A solution to the Erdős-Simonovits problem. Dense triangle-free graphs are four-colorable: A solution to the Erdős-Simonovits problem. Stephan Brandt Technische Universität Ilmenau Fakultät für Mathematik und Naturwissenschaften Postfach 100565

More information

Computing Data Cubes Using Massively Parallel Processors

Computing Data Cubes Using Massively Parallel Processors Computing Data Cubes Using Massively Parallel Processors Hongjun Lu Xiaohui Huang Zhixian Li {luhj,huangxia,lizhixia}@iscs.nus.edu.sg Department of Information Systems and Computer Science National University

More information

PERIODS OF ALGEBRAIC VARIETIES

PERIODS OF ALGEBRAIC VARIETIES PERIODS OF ALGEBRAIC VARIETIES OLIVIER DEBARRE Abstract. The periods of a compact complex algebraic manifold X are the integrals of its holomorphic 1-forms over paths. These integrals are in general not

More information

Lecture 7: Counting classes

Lecture 7: Counting classes princeton university cos 522: computational complexity Lecture 7: Counting classes Lecturer: Sanjeev Arora Scribe:Manoj First we define a few interesting problems: Given a boolean function φ, #SAT is the

More information

Realization polytopes for the degree sequence of a graph

Realization polytopes for the degree sequence of a graph Realization polytopes for the degree sequence of a graph Michael D. Barrus Department of Mathematics Brigham Young University CanaDAM 203 June 2, 203 M. D. Barrus (BYU) Realization polytopes for degree

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #6: Mining Data Streams Seoul National University 1 Outline Overview Sampling From Data Stream Queries Over Sliding Window 2 Data Streams In many data mining situations,

More information

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data. Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss

More information

COUNTING AND PROBABILITY

COUNTING AND PROBABILITY CHAPTER 9 COUNTING AND PROBABILITY Copyright Cengage Learning. All rights reserved. SECTION 9.3 Counting Elements of Disjoint Sets: The Addition Rule Copyright Cengage Learning. All rights reserved. Counting

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2/25/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu 3 In many data mining

More information

Flexible Coloring. Xiaozhou Li a, Atri Rudra b, Ram Swaminathan a. Abstract

Flexible Coloring. Xiaozhou Li a, Atri Rudra b, Ram Swaminathan a. Abstract Flexible Coloring Xiaozhou Li a, Atri Rudra b, Ram Swaminathan a a firstname.lastname@hp.com, HP Labs, 1501 Page Mill Road, Palo Alto, CA 94304 b atri@buffalo.edu, Computer Sc. & Engg. dept., SUNY Buffalo,

More information

Create a simple database with MySQL

Create a simple database with MySQL Create a simple database with MySQL 1.Connect the MySQL server through MySQL Workbench You can achieve many database operations by typing the SQL langue into the Query panel, such as creating a database,

More information

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:

More information

SAT-CNF Is N P-complete

SAT-CNF Is N P-complete SAT-CNF Is N P-complete Rod Howell Kansas State University November 9, 2000 The purpose of this paper is to give a detailed presentation of an N P- completeness proof using the definition of N P given

More information

Pentagons vs. triangles

Pentagons vs. triangles Discrete Mathematics 308 (2008) 4332 4336 www.elsevier.com/locate/disc Pentagons vs. triangles Béla Bollobás a,b, Ervin Győri c,1 a Trinity College, Cambridge CB2 1TQ, UK b Department of Mathematical Sciences,

More information

Approximation slides 1. An optimal polynomial algorithm for the Vertex Cover and matching in Bipartite graphs

Approximation slides 1. An optimal polynomial algorithm for the Vertex Cover and matching in Bipartite graphs Approximation slides 1 An optimal polynomial algorithm for the Vertex Cover and matching in Bipartite graphs Approximation slides 2 Linear independence A collection of row vectors {v T i } are independent

More information

Partitioning Complete Multipartite Graphs by Monochromatic Trees

Partitioning Complete Multipartite Graphs by Monochromatic Trees Partitioning Complete Multipartite Graphs by Monochromatic Trees Atsushi Kaneko, M.Kano 1 and Kazuhiro Suzuki 1 1 Department of Computer and Information Sciences Ibaraki University, Hitachi 316-8511 Japan

More information

On the Approximability of Modularity Clustering

On the Approximability of Modularity Clustering On the Approximability of Modularity Clustering Newman s Community Finding Approach for Social Nets Bhaskar DasGupta Department of Computer Science University of Illinois at Chicago Chicago, IL 60607,

More information

Spatio-temporal Range Searching Over Compressed Kinetic Sensor Data. Sorelle A. Friedler Google Joint work with David M. Mount

Spatio-temporal Range Searching Over Compressed Kinetic Sensor Data. Sorelle A. Friedler Google Joint work with David M. Mount Spatio-temporal Range Searching Over Compressed Kinetic Sensor Data Sorelle A. Friedler Google Joint work with David M. Mount Motivation Kinetic data: data generated by moving objects Sensors collect data

More information

Logic and Discrete Mathematics. Section 2.5 Equivalence relations and partitions

Logic and Discrete Mathematics. Section 2.5 Equivalence relations and partitions Logic and Discrete Mathematics Section 2.5 Equivalence relations and partitions Slides version: January 2015 Equivalence relations Let X be a set and R X X a binary relation on X. We call R an equivalence

More information

Bitmap Index Partition Techniques for Continuous and High Cardinality Discrete Attributes

Bitmap Index Partition Techniques for Continuous and High Cardinality Discrete Attributes Bitmap Index Partition Techniques for Continuous and High Cardinality Discrete Attributes Songrit Maneewongvatana Department of Computer Engineering King s Mongkut s University of Technology, Thonburi,

More information

Robert Cowen and Stephen H. Hechler. Received June 4, 2003; revised June 18, 2003

Robert Cowen and Stephen H. Hechler. Received June 4, 2003; revised June 18, 2003 Scientiae Mathematicae Japonicae Online, Vol. 9, (2003), 9 15 9 G-FREE COLORABILITY AND THE BOOLEAN PRIME IDEAL THEOREM Robert Cowen and Stephen H. Hechler Received June 4, 2003; revised June 18, 2003

More information

Who to Select: Identifying Critical Sources in Social Sensing

Who to Select: Identifying Critical Sources in Social Sensing Who to Select: Identifying Critical Sources in Social Sensing Dong Wang, Nathan Vance, Chao Huang Department of Computer Science and Engineering University of Notre Dame Notre Dame, IN 46556 Abstract Social

More information

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and

Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 6 Outline. Unary Relational Operations: SELECT and Chapter 6 The Relational Algebra and Relational Calculus Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 Outline Unary Relational Operations: SELECT and PROJECT Relational

More information

Detection Theory for Graphs

Detection Theory for Graphs Detection Theory for Graphs Benjamin A. Miller, Nadya T. Bliss, Patrick J. Wolfe, and Michelle S. Beard Graphs are fast emerging as a common data structure used in many scientific and engineering fields.

More information

CMSC 380. Graph Terminology and Representation

CMSC 380. Graph Terminology and Representation CMSC 380 Graph Terminology and Representation GRAPH BASICS 2 Basic Graph Definitions n A graph G = (V,E) consists of a finite set of vertices, V, and a finite set of edges, E. n Each edge is a pair (v,w)

More information