CS 111: Program Design I Lecture 21: Network Analysis. Robert H. Sloan & Richard Warner University of Illinois at Chicago April 10, 2018

Similar documents
CS 111: Program Design I Lecture 15: Objects, Pandas, Modules. Robert H. Sloan & Richard Warner University of Illinois at Chicago October 13, 2016

CS 111: Program Design I Lecture # 7: First Loop, Web Crawler, Functions

CS 111: Program Design I Lecture 19: Networks, the Web, and getting text from the Web in Python

CS 111: Program Design I Lecture 15: Modules, Pandas again. Robert H. Sloan & Richard Warner University of Illinois at Chicago March 8, 2018

CS 111: Program Design I Lecture 16: Module Review, Encodings, Lists

CSE 111 Bio: Program Design I Lecture 17: software development, list methods

CS 111: Program Design I Lecture 14: Encodings & Files concluded; Pandas, Modules, legal data analytics

CS 111: Program Design I Lecture 20: Web crawling, HTML, Copyright

CS 111: Program Design I Lecture 18: Web and getting text from it

CS 111: Program Design I Lecture #26: Heat maps, Nothing, Predictive Policing

CHAPTER IV: GRAPH THEORY. Section 1: Introduction to Graphs

Python Programming: An Introduction to Computer Science

CS 111: Program Design I Lecture 25: Social networks, nothingness. Robert H. Sloan & Richard Warner University of Illinois at Chicago April 24, 2018

Lecture 9: Exam I Review

Module 8-7: Pascal s Triangle and the Binomial Theorem

CS 111 Green: Program Design I Lecture 27: Speed (cont.); parting thoughts

CS 111: Program Design I Lecture # 7: Web Crawler, Functions; Open Access

( n+1 2 ) , position=(7+1)/2 =4,(median is observation #4) Median=10lb

CS 111: Program Design I Lecture 5: US Law when others have encryption keys; if, for

Announcements. Reading. Project #4 is on the web. Homework #1. Midterm #2. Chapter 4 ( ) Note policy about project #3 missing components

Lecturers: Sanjam Garg and Prasad Raghavendra Feb 21, Midterm 1 Solutions

Recursion. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Review: Method Frames

Recursive Procedures. How can you model the relationship between consecutive terms of a sequence?

Graphs ORD SFO LAX DFW

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

27 Refraction, Dispersion, Internal Reflection

CS 11 C track: lecture 1

Parabolic Path to a Best Best-Fit Line:

Quality of Service. Spring 2018 CS 438 Staff - University of Illinois 1

Python Programming: An Introduction to Computer Science

Lecture 7 7 Refraction and Snell s Law Reading Assignment: Read Kipnis Chapter 4 Refraction of Light, Section III, IV

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today

A Taste of Maya. Character Setup

Greedy Algorithms. Interval Scheduling. Greedy Algorithms. Interval scheduling. Greedy Algorithms. Interval Scheduling

Lecture 2: Spectra of Graphs

Which movie we can suggest to Anne?

Switching Hardware. Spring 2018 CS 438 Staff, University of Illinois 1

n n B. How many subsets of C are there of cardinality n. We are selecting elements for such a

The VSS CCD photometry spreadsheet

The Magma Database file formats

Chapter 4. Procedural Abstraction and Functions That Return a Value. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

The isoperimetric problem on the hypercube

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

CSE 111 Bio: Program Design I Class 11: loops

Introduction to OSPF. ISP Training Workshops

Arithmetic Sequences

Data Analysis. Concepts and Techniques. Chapter 2. Chapter 2: Getting to Know Your Data. Data Objects and Attribute Types

arxiv: v2 [cs.ds] 24 Mar 2018

SD vs. SD + One of the most important uses of sample statistics is to estimate the corresponding population parameters.

CS 111: Program Design I Lecture 20: Web crawling, HTML, Copyright

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13

Protected points in ordered trees

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

CS 683: Advanced Design and Analysis of Algorithms

Mindmapping: A General Purpose (Test) Planning Tool

Big-O Analysis. Asymptotics

Combination Labelings Of Graphs

Baan Finance Financial Statements

From last week. Lecture 5. Outline. Principles of programming languages

Computers and Scientific Thinking

Reliable Transmission. Spring 2018 CS 438 Staff - University of Illinois 1

Location Steps and Paths

On (K t e)-saturated Graphs

n Maurice Wilkes, 1949 n Organize software to minimize errors. n Eliminate most of the errors we made anyway.

Requirements Analysis

1.2 Binomial Coefficients and Subsets

Graphs ORD SFO LAX DFW. Lecture notes adapted from Goodrich and Tomassia. 3/14/18 10:28 AM Graphs 1

Counting the Number of Minimum Roman Dominating Functions of a Graph

Goals of the Lecture UML Implementation Diagrams

Τεχνολογία Λογισμικού

Strong Complementary Acyclic Domination of a Graph

Parametric curves. Reading. Parametric polynomial curves. Mathematical curve representation. Brian Curless CSE 457 Spring 2015

Lower Bounds for Sorting

12-5A. Equivalent Fractions and Decimals. 1 Daily Common Core Review. Common Core. Lesson. Lesson Overview. Math Background

Math 10C Long Range Plans

RTG Mini-Course Perspectives in Geometry Series

! Given the following Structure: ! We can define a pointer to a structure. ! Now studentptr points to the s1 structure.

IS-IS in Detail. ISP Workshops

Chapter 10. Defining Classes. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Abstract. Chapter 4 Computation. Overview 8/13/18. Bjarne Stroustrup Note:

Message Integrity and Hash Functions. TELE3119: Week4

Mathematics and Art Activity - Basic Plane Tessellation with GeoGebra

Graphs. Minimum Spanning Trees. Slides by Rose Hoberman (CMU)

How do we evaluate algorithms?

Lecture 5. Counting Sort / Radix Sort

Intermediate Statistics

Performance Plus Software Parameter Definitions

BST Sequence of Operations

EE123 Digital Signal Processing

Random Graphs and Complex Networks T

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #2: Randomized MST and MST Verification January 14, 2015

CSC165H1 Worksheet: Tutorial 8 Algorithm analysis (SOLUTIONS)

Transitioning to BGP

l-1 text string ( l characters : 2lbytes) pointer table the i-th word table of coincidence number of prex characters. pointer table the i-th word

The Closest Line to a Data Set in the Plane. David Gurney Southeastern Louisiana University Hammond, Louisiana

Workflow model GM AR. Gumpy. Dynagump. At a very high level, this is what gump does. We ll be looking at each of the items described here seperately.

Math Section 2.2 Polynomial Functions

The number n of subintervals times the length h of subintervals gives length of interval (b-a).

Transcription:

CS 111: Program Desig I Lecture 21: Network Aalysis Robert H. Sloa & Richard Warer Uiversity of Illiois at Chicago April 10, 2018

NETWORK ANALYSIS

Which displays a graph i the sese of graph/etwork aalysis? A: Left, B. Right, C. Both

Graphs Graph (or etwork) is collectio of odes (aka vertices) ad liks (aka edges). For the formal mided, lik is pair of odes A CS 151 (Discrete Math) textbook might make a formal defiitio like "A graph is a fiite set of odes N together with a set L of pairs of odes."

Graphs Graph (or etwork) is collectio of odes (aka vertices) ad liks (aka edges). For the formal mided, lik is pair of odes A CS 151 (Discrete Math) might make a formal defiitio like "A graph is a fiite set of odes N together with a set L of pairs of odes." Default case: liks are bidirectioal, udirected, ad what we assume if we just see "graph" or "etwork" Ca also have directed liks (e.g., road etwork with oe-way streets): "directed graph"

Example Network with 6 odes ad 7 liks (Udirected) Liks are: (6, 4), (4, 5), (1, 5), (1,2), (2,3), (2, 5) (3, 4)

Examples Social etworks people as odes; fried = (udirected) lik Web pages (directed) page = ode; lik = lik Web crawler: crawls aroud this etwork Computer etworks odes = computers liks = 2 computers that ca commuicate directly Chicago El Nodes = stops; liks betwee adjacet stops Collectio of Phoe Calls Nodes = phoe umbers; liks = phoe calls

Networkx To work with graphs i Pytho, especially for etwork aalysis: import etworkx (as x) Lear more at: https://etworkx.github.io/documetatio/etworkx-1.11/tutorial/idex.html I believe that Spyder will have give versio 1.11, which almost idetical to versio 1.10. etworkx is i process of migratig to versio 2 etworkx provides Graph as basic data type ad ways to add odes ad edges ad do all sorts of thigs, icludig visualize

Simple graph example import etworkx g = etworkx.graph() #Create a empty graph object #Add several odes g.add_ode("alice") g.add_ode("bob") g.add_ode("charlie") g.umber_of_odes() à 3 g.umber_of_edges() à 0

Simple graph example cotiued # Add a sigle edge g.add_edge("alice", "Bob") # udirected >>> g.umber_of_edges() 1 >>> g.odes() ['Alice', 'Charlie', 'Bob'] >>> g.edges() [('Alice', 'Bob')]

Drawig etworkx ca do simple drawig (workig with matplotlib.pyplot uder hood): >>> etworkx.draw_etworksx(g) >>> plt.show() will produce a drawig of our graph (with label ames!) (tutorial says draw but ow deprecated)

Drawig without ode labels import etworkx as x x.draw(g, with_labels=false)

Addig a bit to the graph # Add some more edges ad odes g.add_ode("david") g.add_edges_from([("alice", "Charlie"), ("Alice", "David")]) add_edges_from is a method whose argumet should be list of pairs of ode ames, each pair i ()s.

etworkx.draw(g) (as updated)

Key graph statistic 1: Degree Degree of ode = umber of eighbors ode has Rage of differet degrees discovered i last 20-30 years to vary with ature of graph. etworkx has graph method degree that gives us dictioary with all the degree iformatio for the graph

degrees from etworkx I [5]: d = etworkx.degree(g) I [6]: d Out[6]: {'Alice': 3, 'Bob': 1, 'Charlie': 1, 'David': 1} Will explai that dictioary data structure later. For ow, it's somethig easy to give to our old fried padas:

etworkx degree data à padas I [7]: degree_data = padas.series(etworkx.degree(g)) I [5]: degree_data Out[5]: Alice 3 Bob 1 Charlie 1 David 1 Ad we kow from earlier i semester how to plot graphs from padas series: padas series has a method.plot()

Plottig degree data We wat a histogram: Bar plot where the thigs o the x-axis have a specific meaigful order (e.g., umbers), as opposed to beig categorical (e.g., ames of justices) degree_data.plot(kid='hist') (Uimportat side ote: There is a abbreviatio for that. Ca write.hist() istead of.plot(kid='hist') )

To make plot of series look ice padas put i stuff automatically for some of the plots we did before from dataframes, but it does't always. If eed be: plt.xlabel('strig I wat to see below x axis') plt.ylabel('similarly for y') plt.title('strig I wat up top i title positio')

Plot with some appropriate labels

Remember what our graph looks like Bob Charlie Alice David

Cetrality Alice is coected to everybody else; Bob, Charlie, ad David are coected oly to oe ode each (Alice) Alice is obviously the most cetral Various cetrality measures to tell which odes are most cetral (Prof. Philip Yu of UIC CS foud to be "most cetral" computer sciece author by oe such measure)

What is maximum degree of ode i graph with odes? A. B. 1 C. 2 D. ( 1) / 2 E. 42

Oe simple measure of cetrality of a ode degree of ode / maximum possible degree of ay ode i that ode's graph For -ode graph: degree of ode / ( - 1) etworkx will give us (a dictioary of) the cetrality of every ode i graph g: cet = etworkx.degree_cetrality(g)

For our little graph I [11]: cet = etworkx.degree_cetrality(g) I [12]: cet Out[12]: {'Alice': 1.0, 'Bob': 0.3333333333333333, 'Charlie': 0.3333333333333333, 'David': 0.3333333333333333}

Path legths How may edges do we eed to walk over to get from oe ode to aother? 0 to get from ode to itself 1 to get to immediate eighbor > 1 to get to all other odes

Gettig all path legths eworkx has operator for this; gives somewhat complex data structure back padas to the rescue: It kows how to hadle that data structure ad tur it ito a dataframe, which we already kow about: padas.dataframe(etworkx.all_pairs_shortest_path_legth(g))

All path legths i our graph p = padas.dataframe(etworkx.all_pairs_shortest_pa th_legth(g)) >>> p Alice Bob Charlie David Alice 0 1 1 1 Bob 1 0 2 2 Charlie 1 2 0 2 David 1 2 2 0

Aother stat: Average shortest path legth etworkx will calculate the average over all the shortest path legths for you: # Get the average path legth prit(etworkx.average_shortest_path_legth(g)) 1.5

Our 4 ode graph is kida dull Poit is to apply these sorts of techiues to e.g., graphs of various types of social etworks with thousads to 1 billio+ odes Our example data (real data): odes = twitter users edge = follows relatioship (could be directed; could igore directio) ~40,000 pairs of follower, followee (This particular bit of twitter formed by techiue called sowball samplig startig at Computatioal Legal Aalytics)

Large etworks Stored as text files Oe lie for each lik with lie cotaiig ames (strig or umber) of odes Notice that if we kow all the liks the we kow what the odes are Both comma ad space are commo delimiters for betwee the two odes of a edge i large etwork work Both are, broadly speakig, CSV We'll use padas to read these i