Statistical graphics in analysis Multivariable data in PCP & scatter plot matrix Paula Ahonen-Rainio Maa-123.3530 Visual Analysis in GIS 11.11.2015
Topics today YOUR REPORTS OF A-2 Thematic maps with charts for analysis Some remainders LECTURE/YOUR EXAMPLES IN A-2 Guidelines for statistical graphics line, column and bar charts; Do you really need pie charts? histogram and box plot to avoid misleading presentation of data LECTURE (continues in A-3) Basic methods for multivariable data Scatter plot matrix PCP (parallel coordinates plot)
Visual analysis of large data amounts The aim: pattern discovery trends correlation and relationships detection of irregularities (incl. outliers) for exploration or confirmation of hypotheses Capacity of maps and charts may be limiting Preprocessing of data often necessary, e.g. transformations, classification, clustering,
Limitations of maps in analysis Limited capacity for multiple variables Intuitiveness of the interpretation is both a strength and weakness Cf. preattentive vs. attentive perception dominance of large sizes vs. importance of areas Our visual system rather compares than measures Influence of projection in large areas Tools for interaction are necessary adds capacity remarkably Other?
Notice In addition to thematic mapping, maps should provide a sufficient and suitable geographic reference for locating thematic phenomena for revealing reasons behind, such as differences and discontinuities in geography!!! E.g. spatial variation of rainfall is partly dependent on the variations of elevation E.g. variation of voting behaviour (pre-election voting) is partly dependent on the habitation density, i.e. the distances to voting places E.g. variation of voting results is partly dependent on the variations in economic activities of land use (agricultural vs. industrial vs. urban regions)
Statistical graphics Role of statistical graphics in visual analysis presentation preparation for analysis also exploration? How to make a proper chart Books of Edward Tufte Wainer, H. (1984) How to Display data Badly. The American Statistician, vol 38, no 2. http://users.stat.umn.edu/~sandy/courses/8801/handouts/04.tabular/wainer1984.pdf Presentation of statistical data paper print by V. Kuusela, Statistics Finland (available during the sessions)
Avoid the typical mistakes in statistical graphics! Study the basics, check the following: Whether you present magnitudes or trends Continuity is interpreted from the horizontal axis The principle difference between line chart and column chart The principle difference between column chart and bar chart* Problems with the interpretation of pie charts * column chart: vertical bars bar chart: horizontal bars Title is as essential element in a chart as it is in a map
Tips for proper graphics study these! Watch the video and/or browse the slides: D. Taylor Introduction to Data Visualization in YouTube https://www.youtube.com/watch?v=xigjtudgxyy Snap shots of the video http://www.slideshare.net/prooffreader/introduction-to-data-visualization-41067274 Look at these examples of D.Taylor (slide #) [time in the video] Do not cut colums (18-19) [6:10-] Golden ratio slope and aspect ratio (20-23) [7:70-10:05] Stacked bar charts (34) [12:55-14:25] Keep in mind your audience (42) [17:45-19:00] Histogram (48-50) [21:15-] Data to ink ratio by E. Tufte (51) [22:50-] Pie chart (53-55) [23:40-] Jobs of a data visualization (56) [25:10-] Problem with logarithmic scale (57) [26:20-] First impression [28:00-28:30]
What we can learn from your examples This is a learning session, so please don t get upset if we criticise your graphics...
Mapping of data to coordinates For multivariable data Variables on coordinates all data objects in the same display requires metric values need for transformation from nominal and ordinal to metric; see the example in the prereading about PCP scaling of axes is critical may require preprocessing some tools: interactively during the analysis Basic methods Scatter plot 3D scatter plot, scatter plot matrix Parallel coordinates plot (PCP) 3D parallel coordinates, radial coordinates
Scatter plot fi: parvikuvio, hajontakuva Dependency between two variables: Mathematically: correlation coefficient +1 0-1 regression Visually in a scatter plot for bivariate analysis independent and dependent variable intuitive interpretation: y depends on x y x proportional relationship inverse relationship no correlation patterns may be less clear
scatter plot If we have multiple variables, how are they correlated? What if there is local correlation instead of the global one? How to detect the dependent subsets? Scatter plot for bivariate analysis matrix for multiple variables
Example: Scatter plot matrix Three species of iris distinguised by three colours linear relationship clustering http://support.sas.com/
Example: Scatterplot matrix added with In this multiform matrix only the elements above the diagonal are scatterplots. So, only one setting of independent/dependent variables is displayed. Histograms occupy the diagonal. The below-diagonal space displays bivariable spacefill visualisations. In a spacefill each grid square or pixel represents one data object. The colour of a square represents one of the two attributes and the order of the squares (e.g. a scanline) is according to the second attribute. This technique solves the problem of overprinting in a scatterplot. A spacefill can be used to visually estimate the strength of the relationship between the two displayed attributes. If the attributes are strongly correlated, there is a relatively regular and smooth transition from the lightest to the darkest colour from bottom to top (or from top to bottom). A weaker correlation produces a scattered pattern. A random pattern means that there is no correlation between the respective attributes.
Parallel coordinates plot (PCP) An axis for each variable, all axes are displayed in parallel Several axes, ordering of axes interactively (in the example only two variables, to demonstrate the idea and compare the depiction with a scatterplot) Each data object is presented by a polyline that intersects each axis according to the value of the respective variable B A A B
Example: PCP
PCP What we perceive: A polyline Crossing of a polyline with an axis Pattern of lines between two adjacent axes Therefore, interactivity is necessary for the analysis Reordering of the axes changes the patterns
Brushing for focus The counties with the highest percentages of college graduates have been highlighted. S.Few: Multivariate Analysis Using Parallel Coordinate (see the prereading)
Coping with too many data objects: Colour coding by classification Per one variable only The three views to the data are linked by colour: In the upper view (PCP), the data objects are classified by their value of the rightmost variable and colour- coded accordingly. 3D scatter plot MacEachren et al. GeoVISTA www.psu.edu/geovista
Coping with too many data objects: Clustering & mean values Chen & MacEachren (2008) Resolution Control in Multivariate Spatial Analysis. CaJ 45:4, pp. 261-273.
Example: Categorical values in PCP A parabox (another name for parallel coordinates) graph from Advizor Solutions Source: S.Few "Multivariate Analysis Using Parallel Coordinates"