Simulation of SpatiallyEmbedded Networks

Similar documents
Exponential Random Graph Models for Social Networks

Review of Cartographic Data Types and Data Models

Halftoning and quasi-monte Carlo

Random Graph Model; parameterization 2

Statistical Methods for Network Analysis: Exponential Random Graph Models

Static and Dynamic Robustness in Emergency- Phase Communication Networks

Nearest Neighbor Predictors

B553 Lecture 12: Global Optimization

Computational Issues with ERGM: Pseudo-likelihood for constrained degree models

Models of Network Formation. Networked Life NETS 112 Fall 2017 Prof. Michael Kearns

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

GRID BASED CLUSTERING

Community Detection. Community

Dijkstra's Algorithm

10-701/15-781, Fall 2006, Final

Week 7 Picturing Network. Vahe and Bethany

Link Prediction for Social Network

AN INTRODUCTION TO GRID ANALYSIS: HOTSPOT DETECTION

Monte Carlo for Spatial Models

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Lecture 6: GIS Spatial Analysis. GE 118: INTRODUCTION TO GIS Engr. Meriam M. Santillan Caraga State University

Overview of Record Linkage Techniques

Algorithms for GIS. Spatial data: Models and representation (part I) Laura Toma. Bowdoin College

Transitivity and Triads

Lesson 5 overview. Concepts. Interpolators. Assessing accuracy Exercise 5

1 More stochastic block model. Pr(G θ) G=(V, E) 1.1 Model definition. 1.2 Fitting the model to data. Prof. Aaron Clauset 7 November 2013

Motion Planning. Howie CHoset

Mixture Models and the EM Algorithm

CS 534: Computer Vision Segmentation and Perceptual Grouping

The Curse of Dimensionality

Economic Geography An analysis of the determinants of 3G and 4G coverage in the UK

Beyond The Vector Data Model - Part Two

An Introduction to Model-Fitting with the R package glmm

Package BayesLogit. May 6, 2012

Spatial Data Management

A Keypoint Descriptor Inspired by Retinal Computation

Lecture 06. Raster and Vector Data Models. Part (1) Common Data Models. Raster. Vector. Points. Points. ( x,y ) Area. Area Line.

Chapter 4. Clustering Core Atoms by Location

Spatial Data Management

Random projection for non-gaussian mixture models

Developed as part of a research contract from EPA Region 9 Partnered with Dr. Paul English, CA DPH EHIB

RECOMMENDATION ITU-R P DIGITAL TOPOGRAPHIC DATABASES FOR PROPAGATION STUDIES. (Question ITU-R 202/3)

COMPUTATIONAL STATISTICS UNSUPERVISED LEARNING

CS 229 Midterm Review


An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve

Web Structure Mining Community Detection and Evaluation

Deep Generative Models Variational Autoencoders

Tips for a Good Meshing Experience

Effect of Physical Constraints on Spatial Connectivity in Urban Areas

1 Large-scale network structures

Statistical Modeling of Complex Networks Within the ERGM family

Maps as Numbers: Data Models

Calibration and emulation of TIE-GCM

THE classical approach to multiple target tracking (MTT) is

Weka ( )

Generative and discriminative classification techniques

Spectral Methods for Network Community Detection and Graph Partitioning

Rockefeller College University at Albany

CS Introduction to Data Mining Instructor: Abdullah Mueen

DC2 File Format. 1. Header line 2. Entity line 3. Point line 4. String line

How Do Real Networks Look? Networked Life NETS 112 Fall 2014 Prof. Michael Kearns

CHAPTER 1 INTRODUCTION

Geographic Information Systems. using QGIS

Lecture 5: Exact inference

Lab 12: Sampling and Interpolation

Nonparametric Importance Sampling for Big Data

Digital Image Classification Geography 4354 Remote Sensing

Note Set 4: Finite Mixture Models and the EM Algorithm

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing

Math 443/543 Graph Theory Notes 10: Small world phenomenon and decentralized search

INCREASING CLASSIFICATION QUALITY BY USING FUZZY LOGIC

Input/Output Machines

Edge and local feature detection - 2. Importance of edge detection in computer vision

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION

Using Machine Learning to Optimize Storage Systems

Geometry / Integrated II TMTA Test units.

AUTOMATIC EXTRACTION OF TERRAIN SKELETON LINES FROM DIGITAL ELEVATION MODELS

Bioimage Informatics. Lecture 8, Spring Bioimage Data Analysis (II): Point Feature Detection

v Modeling Orange County Unit Hydrograph GIS Learn how to define a unit hydrograph model for Orange County (California) from GIS data

Introduction to Objective Analysis

On internal consistency, conditioning and models of uncertainty

An Introduction to Markov Chain Monte Carlo

Clustering with Reinforcement Learning

Supplementary Figure 1. Decoding results broken down for different ROIs

k-means demo Administrative Machine learning: Unsupervised learning" Assignment 5 out

Calibrate your model!

Algorithms for GIS csci3225

Overview. Social Networks Vis. Measures in SNA. What are social networks?

Generative and discriminative classification techniques

What is Computer Vision?

Probabilistic Graphical Models

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Surface Shape Regions as Manifestations of a Socio-economic Phenomenon

Basic facts about Dudley

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

A Nonparametric Bayesian Approach to Detecting Spatial Activation Patterns in fmri Data

1 Case study of SVM (Rob)

Reconstructing Broadcasts on Trees

Accuracy, Support, and Interoperability. Michael F. Goodchild University of California Santa Barbara

Transcription:

Simulation of SpatiallyEmbedded Networks Carter T. Butts12 Ryan Acton1, Zack Almquist1 Department of Sociology 2 Institute for Mathematical Behavioral Sciences University of California, Irvine 1 Prepared for the August 25, 2009 UCI MURI AHM. This research was supported by ONR award N00014-08-1-1015. 1

Spatially Embedded Networks Simple idea: assign vertices to spatial locations Spatial embedding of G=(V,E) Location function ℓ:V S, where S is an abstract space Properties of S Admits some distance, d May or may not be continuous May or not be metric May contain social dimensions ( Blau space) as well as physical ones For present purposes, take ℓ as given, fixed Useful, but can be relaxed (Data from Freeman et al., 1988) 2

An Inhomogeneous Bernoulli Family for Spatially Embedded Networks A simple family of models for spatially embedded social networks: Pr Y = y d = B Y ij = y ij ℱ d ij {i, j } where Y {0,1}NxN, d [0, )NxN, ℱ: [0, ) [0,1], B Bernoulli pmf Special case of the inhomogeneous Bernoulli graph family with parameter matrix Φij=ℱ(dij) Assumes that dependence among edges absorbed by distance structure edges conditionally independent Related to the gravity models, i.e. EY ij =P i P j F d ij where P is an interaction potential, and F is an impedance or spatial interaction function 3

Generalization to Curved Exponential Random Graph Models Increasingly widely used approach ERG form Our likelihood can be rewritten as a curved ERG Pr Y = y, d exp, d y,, d =logit ℱ, d ij {i, j} Sufficient statistics are the edge indicators of Y; canonical parameters ( ) are logits of marginal edge probabilities O(N2) canonical parameters computational savvy advised General curved model: space + other effects T Pr Y = y ℱ,, d exp t Y ℱ ℱ, d y ij {i, j} Allows for integration of complex edge dependence, degree distribution constraints, other covariate effects, etc. (through t) Can use to control for social mechanisms when seeking spatial effects, or spatial effects when seeking social mechanisms 4

Using Spatial Models for Detailed Network Simulation Start with GIS data on populations in space Here, block-level information on metropolitan/micropolitan areas, as defined by US census Draw individual positions from a point process, given GIS constraints Attempt to approximate distribution of individual residences within blocks Draw network from spatial Bernoulli graph model given individual positions Requires a fitted SIF (obtained from prior data, or first principles) In practice, need to do clever things to make this scale Subdivide space into regions, avoid simulating all pairs for regions with very low probability of positive tie volume 5

Placing People Within Blocks Challenge: placing individuals within census blocks Want simple, reasonably fast model which captures basic properties of residential settlement in a plausible way Constraints: fixed total population, household size distribution Needs to work with arbitrary regions, without requiring additional data Two simple approaches used here for planar coordinates Uniform placement: household coordinates drawn as Poisson process with constant intensity within each block; individual coordinates w/in household drawn from circular distribution centered on household location with maximum radius 5m Quasi-random placement: as per uniform placement, but household coordinates drawn from a two-dimensional Halton sequence (bases 2 and 3) 6

Vertex Placement, Illustrated 7

Vertex Placement, Illustrated 8

Vertex Placement, Illustrated 9

Vertex Placement, Illustrated 10

Vertex Placement, Illustrated 11

Vertex Placement, Illustrated r 12

Vertex Placement, Illustrated 13

Example: Lawrence, KS 14

Example: Lawrence, KS 15

Artificial Elevation Issue: actual high-density blocks feature artificial elevation as an essential feature of the built environment Effect: inhibits local interaction, relative to what would be observed if everyone resided in x,y plane Crude solution: artificial elevation model Assign households to planar coordinates in random order When placing ith household, note number of other households within planar radius r (call this k) Set elevation of ith household to αk meters Result: single-story housing predominates, but multistory configurations emerge in dense areas For our purposes, arbitrarily set r=10m, α=4m 16

Artificial Elevation, Illustrated 17

Artificial Elevation, Illustrated 18

Artificial Elevation, Illustrated 19

Artificial Elevation, Illustrated 20

Artificial Elevation, Illustrated 21

Artificial Elevation in Lawrence, KS Models 22

Choosing an SIF: Two Test Relations Festinger et al. (1950) - Social Friendship Collected in post WW-II housing project during 1946-47 Subjects asked to provide three people you most see socially Freeman et al. (1988) - Face-to-Face Interaction Collected on a southern California beach 54 subjects observed for 60 hours over a 30 day period Mean distance between actors and minutes interacting reported For each data set, numbers of possible, observed edges at each distance used to infer SIF Bayesian inhomogeneous Bernoulli graph model, SIF selected by Bayes Factor 23

Declines as apx d-2.8 24

Declines as apx d-6.4 25

Geographical Cases In progress: examine a range of functional settlements, stratified by total population and spatial area Started with all US micropolitan and metropolitan areas (as defined by the US census For population, consider approximate size strata of 1,000, 10,000, 100,000, and 1,000,000 For area, consider log area strata of 9, 10, 11, and 12 For each combination of conditions, sought settlement with minimum least squares deviation from desired population/land area Network simulation performed on each settlement, using each coordinate generation model and each relation US Settlement Distribution by Log Population, Land Area 26

Golden Valley, MT: Festinger Net, Uniform Model 27

Spatial Distribution of Degree: Navajo, AZ 28

29 Spatial Distribution of Degree: Cookeville, TN

Spatial Distribution of Core Number: Navajo, AZ 30

31 Spatial Distribution of Core Number: Cookeville, TN

Spatial Coverage of Robustly Connected Groups 32

Initial Impressions from the Simulation Studies (So Far) Clear impact of spatial clustering on network structure Spatial clustering raises density only slightly, but small changes have large threshold effects Cluster spatial extent, proximity is also important (not just local intensity): cross-cluster ties can sustain connectivity Spatial heterogeneity has real consequences Choice of point placement algorithm does make a difference Halton sequence tends to suppress clustering, degree, by maximizing the minimum distance between households; uniform generates "clumps" that elevate local tie volume Effect should be largest for short-range ties, light-tailed SIFs, or properties based on local clustering; weakest for longrange interaction 33

Some Ongoing Questions How to further improve performance? Gridding system is often good, but requires tuning (and fails in places like Honolulu) Going to try R-trees - other suggestions? Network objects getting very big; may need to think of ways to segment and partially load on use (not so easy) Closely related problem (much work done here, but not shown!): estimating tie volumes between regions MC quadrature using Halton sequences turned out to be expensive - a fast quasi-random strategy would be nice Point-in-polygon calculations are the current bottleneck Now using R-trees to speed performance; seems to work well, but is there a better way? 34

Summary We now have tools for simulating spatially embedded networks on fairly large scales Currently works up to around 1e6 nodes Can use real geography; limited to projections for exact network (for now), but tie volumes are on WGS ellipsoid Assembling a large collection of test locations, networks Interesting opportunities going forward More performance enhancements Using testbed sims to evaluate algorithm performance Applying modeling strategy to simulation of latent space models, fast approximate prediction (e.g., tie volumes between groups for block models), etc. 35