Week 7 Picturing Network. Vahe and Bethany

Similar documents
Modern Multidimensional Scaling

Predict Outcomes and Reveal Relationships in Categorical Data

Modern Multidimensional Scaling

Stats fest Multivariate analysis. Multivariate analyses. Aims. Multivariate analyses. Objects. Variables

Scaling Techniques in Political Science

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Unsupervised Learning

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Dimension reduction : PCA and Clustering

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Unsupervised Learning

IBM SPSS Categories 23

To make sense of data, you can start by answering the following questions:

IBM SPSS Categories. Predict outcomes and reveal relationships in categorical data. Highlights. With IBM SPSS Categories you can:

Clustering and Visualisation of Data

MULTIDIMENSIONAL SCALING: Using SPSS/PROXSCAL

Why Should We Care? More importantly, it is easy to lie or deceive people with bad plots

MULTIDIMENSIONAL SCALING: MULTIDIMENSIONAL SCALING: Using SPSS/PROXSCAL

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

Unsupervised Learning

Year 8 Set 2 : Unit 1 : Number 1

Chemometrics. Description of Pirouette Algorithms. Technical Note. Abstract

CSAP Achievement Levels Mathematics Grade 6 March, 2006

Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975.

San Jose State University. Math 285: Selected Topics of High Dimensional Data Modeling

Dimension Reduction CS534

Chapter 1. Using the Cluster Analysis. Background Information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Mathematics Expectations Page 1 Grade 06

Exploratory data analysis for microarrays

Year 8 End of Year Exams Revision List

Machine Learning (BSMC-GA 4439) Wenke Liu

Step-by-Step Guide to Relatedness and Association Mapping Contents

An Experiment in Visual Clustering Using Star Glyph Displays

Numerical & Proportional Reasoning: Place Value Patterns & Equivalent Forms Kindergarten Grade 2 Grades 3-5

Statistical Pattern Recognition

TIPS4Math Grades 4 to 6 Overview Grade 4 Grade 5 Grade 6 Collect, Organize, and Display Primary Data (4+ days)

Year 9: Long term plan

Maths PoS: Year 7 HT1. Students will colour code as they work through the scheme of work. Students will learn about Number and Shape

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

7 Fractions. Number Sense and Numeration Measurement Geometry and Spatial Sense Patterning and Algebra Data Management and Probability

CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp

ECLT 5810 Clustering

Visual Encoding Design

Machine Learning (BSMC-GA 4439) Wenke Liu

ECLT 5810 Clustering

BUSINESS DECISION MAKING. Topic 1 Introduction to Statistical Thinking and Business Decision Making Process; Data Collection and Presentation

Non-linear dimension reduction

Visualization Stages, Sensory vs. Arbitrary symbols, Data Characteristics, Visualization Goals. Trajectory Reminder

Cluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6

Hierarchical Clustering / Dendrograms

Version 2.4 of Idiogrid

Decimals should be spoken digit by digit eg 0.34 is Zero (or nought) point three four (NOT thirty four).

Statistical Pattern Recognition

Multidimensional Scaling Presentation. Spring Rob Goodman Paul Palisin

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Data Analyst Nanodegree Syllabus

Two-dimensional Totalistic Code 52

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS

Performance Level Descriptors. Mathematics

Name Date Grade Mathematics K

Clustering. Bruno Martins. 1 st Semester 2012/2013

The Use of Biplot Analysis and Euclidean Distance with Procrustes Measure for Outliers Detection

MULTIDIMENSIONAL SCALING: AN INTRODUCTION

Grade 8: Content and Reporting Targets

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Introduction to Clustering and Classification. Psych 993 Methods for Clustering and Classification Lecture 1

Getting to Know Your Data

Glossary Common Core Curriculum Maps Math/Grade 6 Grade 8

Statistical Pattern Recognition

CMP 2 Grade Mathematics Curriculum Guides

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

The Common Curriculum Framework. for K 9 MATHEMATICS. Western and Northern Canadian Protocol. May 2006

Linear Methods for Regression and Shrinkage Methods

8 th Grade Pre Algebra Pacing Guide 1 st Nine Weeks

Year 7 Set 1 : Unit 1 : Number 1. Learning Objectives: Level 5

LASER s Level 2 Maths Course - Summary

Preliminaries: Size Measures and Shape Coordinates

Clustering analysis of gene expression data

MHPE 494: Data Analysis. Welcome! The Analytic Process

Curriculum Connections (Fractions): K-8 found at under Planning Supports

B. Number Operations and Relationships Grade 7

Probabilistic Analysis Tutorial

Table of Contents (As covered from textbook)

Douglas County School District Sixth Grade Critical Content Checklist Math

Data Analyst Nanodegree Syllabus

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

To finish the current project and start a new project. File Open a text data

Forestry Applied Multivariate Statistics. Cluster Analysis

SGN (4 cr) Chapter 10

CURRICULUM UNIT MAP 1 ST QUARTER

Perception Maneesh Agrawala CS : Visualization Fall 2013 Multidimensional Visualization

Pattern Clustering with Similarity Measures

Figure 1: Workflow of object-based classification

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Themes in the Texas CCRS - Mathematics

DI TRANSFORM. The regressive analyses. identify relationships

Multiresponse Sparse Regression with Application to Multidimensional Scaling

Spectral Classification

Transcription:

Week 7 Picturing Network Vahe and Bethany

Freeman (2005) - Graphic Techniques for Exploring Social Network Data The two main goals of analyzing social network data are identification of cohesive groups and identification of social positions. Moreno (1932) was the first to use visual images to show patterning of linkages among actors.

Multidimensional scaling (MDS) Moreno developed procedures for placing points that reveal structural features of the data. The problem was his procedures were ad hoc rather than systematic. Current systematic procedures attempt to produce an image that preserves the existing pattern in the data. 1. Multidimensional scaling (MDS) MDS uses a search procedure to find the optimal locations at which to place the points. Optimal locations are either: o o Those that come closest to reproducing the pattern of the original N-dimensional social proximities in the data matrix or Those that come closest to reproducing the order of the original N-dimensional social proximities in the data matrix (but not necessarily the exact magnitudes)

Multidimensional scaling (MDS)

Singular value decomposition (SVD) 2. Singular value decomposition (SVD) Based on an algebraic procedure that transforms the N original variables into N new variables ordered from largest to smallest. They are ordered according to the percent of variance (pattering) in the original data that is associated with each. The produced image will only be useful if a one, two, or three dimensional solution is associated with virtually all of the variance contained in the original data. There are different ways that SVD solutions are produced. One of the better know approaches is principal components analysis (PCA).

Singular value decomposition (SVD) Using either MDS or SVD we can determine whether a data set has interesting structural properties according to the degree that the produced image departs from a disk shaped image. This is the first step in the exploratory analysis of the network data.

Graphs as aides for exploring social network data Example #1 Exploratory research What variable determines how employees choose a coworker to spend time with outside of work? What would you look at in order to answer this question?

Graphs as aides for exploring social network data

Graphs as aides for exploring social network data Example #2 A priori hypothesis Hypothesis: Athletes in individual sports (tennis, golf, archery) confide in teammates more than athletes in team sports (basketball, football, baseball). What would you look for in this image?

Scott (1991) Dimensions & Displays Can you think of any limitations in this display of relational data?

Scott (1991) Dimensions & Displays One common technique has been to construct the sociogram around the circumference of a circle so that the pattern of lines becomes more visible. Any problems with this?

Metric Multidimensional scaling (MDS) Produces a much closer map of the space in which the network is embedded by converting graph theoretical measures and expressing them as metric measures. Metric refers to any model of space and distance in which there are known and determinate relations among its properties. A metric framework allows for the measurement of distances and directions after a configuration of points and lines has been made into a metric map (not graph theory path distance but closer to Euclidean distance) Steps: Produce a case-by-case proximity matrix from graph theoretical measures by converting relational data into correlation coefficients. (conversion necessary relational data does not conform to the assumptions of a Euclidean metric) Metric MDS aims to produce a configuration in which the pattern of metric distances corresponds to the pattern of proximities (based on either similarities or dissimilarities) The construction of a simple physical map gives a good insight into the outcome of a metric MDS of relational data.

Principal Components Analysis (PCA) PCA definition? PCA Steps: 1) Converts a case-by-variable matrix of attribute data into a variable-by-variable matrix of correlations. 2) Search variable-by-variable matrix for those variables that are highly correlated with one another. 3) Replace this identified set of highly correlated variables with the 1 st principal component. - The principal component is constructed based on the correlation between the identified set of highly correlated variables. 4) Repeat step 2 with the restriction that the next set of highly correlated variables not be correlated with the first identified set. 5) Replace this identified set of highly correlated variables with the 2 nd principal component. 6) Repeat the steps 2 and 3 until all the principal components have been identified. - The initial variable-by-variable correlation matrix has now been converted to a variable-by-component correlation matrix in which the cell entries are the component loadings computed for each variable against each component.

Principal Components Analysis (PCA) PCA attempts to identify a set of uncorrelated principal components that collectivity account for all of the variance in the data. The 1 st principal component stands for the most highly correlated set of variables and accounts for the largest amount of variance in the data. The independence of principal components means that they can be drawn at right angles to each other. Diagrams with more than 3 principal components can not be drawn. The goal of the researcher is to identify the smallest number of principal components that are meaningful and explain most of the variance in the data. The principal components are used as axes on a scatter diagram and the loadings are used to plot the position of each variable within these axes. The scatter of this diagram can be rotated (if needed) to produce a positioning of the configuration that gives the best alignment with the main axes. Any rotated solution results in a new variable-by-component correlation matrix that contains a revised set of loadings.

Non-metric Multidimensional scaling (MDS) Metric MDS has a number of limitations. Many relational data sets are binary in form and can not be directly used to measure proximity. Researchers may make unwarranted assumptions from binary (nominal) data. The use of ratio or interval measurement may not be appropriate. Example: Two companies with four directors in common may not be twice as close to one another as two companies with two directors in common. Non-metric MDS requires a symmetrical adjacency matrix. The procedure only takes into account rank order data are treated as measures at the ordinal level. Non-metric MDS seeks a solution in which the rank ordering of the distances is the same as the rank ordering of the original values.

Non-metric Multidimensional scaling (MDS) Steps: 1) Sort cell values of the original matrix into descending order. 2) Construct a new matrix by replacing original values with rank scores. 3) Construct a matrix of Euclidean distances that preserves the rank ordering of the original data. The Euclidean distances matrix is calculated by an iterative procedure that compares the rank order of the distances with that of the proximity rank matrix. This matrix can be used to draw a metric scatter plot similar to those produced in PCA.

Determining the number of dimensions as an acceptable solution A researcher must undertake a number of different analyses in order to discover which of the various solutions (# of dimensions) provides the best overall fit with the original data. Option #1 Stress Test! Stress measures the average spread of points around the line of good fit in the Shepard diagram. Does the definition of stress remind you of something?

Determining the number of dimensions as an acceptable solution Option #1 Stress Test! Plotting the stress values for each solution produces the diagram bellow. Does this diagram remind you of something?

Determining the number of dimensions as an acceptable solution Option #2 Absolute value of stress The absolute value of stress is another way to determine the fit of your solution and acceptable number of dimensions. According to Kruskal (1964): - Stress values of less than or equal to 5% indicates good fit - Stress values around 10% indicate fair fit - Stress values around 20% indicate poor fit

Determining the number of dimensions as an acceptable solution Option #3 Interpretation of dimensions The number of adequate dimensions will often be more than two. How do we draw the final configuration on a flat sheet of paper? The most common solution is to display on paper successive two-dimensional slices through the configuration. For example, in a three-dimensional solution, the configuration can be represented on paper as three separate 2D views of the overall configuration: D1 & D2, D2 & D3, D1 & D3. We can then plot the configuration of points from a MDS program within this space defined by the number of dimensions.

Determining the number of dimensions as an acceptable solution Option #3 Interpretation of dimensions The process of interpretation concerns two main questions. 1) What is the meaning of the dimensions? (Rotation helps in this interpretation) 2) What is the significance of the way points are arranged? (Identify clusters and use the dimensions to give some meaning to their position in relation to one another)

The Analysis and Interpretation of Multivariate Data for Social Scientists David Bartholomew Multidimensional scaling (MDS) Distance is the prime concept Input data is in the form of a distance matrix representing the distance between a pair of objects Output will be expressed in terms of distance There is a strong subjective element in using this method

Typical problems Often a degree of arbitrariness in the definition of distance Little idea how many dimensions would be necessary in order to reproduce, even approximately, the given distances between objects of interest Need measures of similarity between columns of the data matrix instead of the rows in order to carry out an MDS analysis on variables

Key Terms/Definitions Classical Multidimensional scaling: The distances on the map would be in the same metric (scale of measurement) as the original б ij s Ordinal Multidimensional scaling: (Non-metrical) Producing a map on which the distances have the right rank order if the distances come from subjective similarity ratings Stress: Measure of the goodness-of-fit to judge how many dimensions are necessary. Values of stress that are close to zero indicate that the MDS solution is a good fit. Disparities: A smoothing process that is usually necessary using least-square monotonic regressions to construct fitted distances in ordinal MDS. Makes the Ď ij s are in the same rank order as δ ij s

Assessing the fit in MDS As the number of dimensions increase, the stress decreases, thus there is a trade-off between improving fit and reducing the interpretability of the solution. Overall a useful method, even when there is a rather poor fit, some meaning can still be extracted from the analysis. Investigate as many solutions as possible by adding dimensions to discover meaningful groupings.

The Effects of Spatial Arrangement on Judgments and Errors in Interpreting Graphs Cathleen McGrath, Jim Blythe, and David Krackhardt

The number of subgroups that individuals perceive in a graph may be influenced by the spatial clustering of nodes in the arrangement. Good spatial arrangements' should not leave the viewer with the impression that all nodes have the same values for prominence or bridging when in fact they vary on these dimensions. The 'best' spatial arrangement for a social network may often depend on the information that the arrangement is intended to convey.

New Directions in the Study of Community Elites Edward Laumann & Franz Pappi Conflict is an endemic, necessary feature of any community decision-making apparatus, posing the fundamental functional problem of integration for such structures, that is, the problem of establishing binding priorities among competing goals. Unit of analysis: the individual actor in a particular kind of social position, found by reputation and nominations from well-informed community members or community influentials. A list of 55 individuals was compiled based on interviews with 46 people. Listing was ranked based on the number of mentions of having influence to create an average influence status, the lower the number the higher the average influence status.

Findings The analysis can predict how given issues will be resolved by determining the functionally specialized structure likely to be activated by a given functional issue. It can also assess the likelihood of a sector being divided on an issue by examining the relative spread or clustering of personnel in a particular institutional sector in the spatial solutions and their locations with respect to the central integrative core. If there is significant sectoral or integrative differentiation, the winning coalition can be predicted as the one favorably located relative to the integrative core and including a higher average level of reputed community influence.