An Experiment in Visual Clustering Using Star Glyph Displays

Size: px
Start display at page:

Download "An Experiment in Visual Clustering Using Star Glyph Displays"

Transcription

1 An Experiment in Visual Clustering Using Star Glyph Displays by Hanna Kazhamiaka A Research Paper presented to the University of Waterloo in partial fulfillment of the requirements for the degree of Master of Mathematics in Statistics Waterloo, Ontario, Canada September 30, 2011

2 Contents 1 Introduction 4 2 Theory and Methods Graph Traversal Algorithms Applied to Star Glyph Displays Tree Distances Metric Experimental Design Hypothesis Data Sets Protocol Interface Results Data Description Analysis Exploratory Analysis Models Comparing Standard Deviations Conclusions Acknowledgments References 2

3 Abstract A star glyph plot is a visualization technique often employed for the task of clustering high-dimensional data points. The order in which the data variables are assigned to the axes affects the shape of a star glyph; changing this ordering may result in a different clustering outcome for the same data. To reduce the order-dependence, a sequence for axes assignment in which all pairs of variables appear adjacently is suggested. An experiment is performed to compare subjects performance when clustering with the ordinary star glyph plot, and when working with the improved version. The analysis and results of this study are presented in this paper. 3

4 1 Introduction The ordering of components in a statistical graphical display often requires some consideration on behalf of the statistician. Choosing an alternate ordering may change certain aspects of a graphical display, thus revealing or hiding patterns, trends, anomalies and relations in the data. A star glyph plot is one such display which depends on the ordering of data variables. A star glyph plot is a tool used to visualize, and then cluster multidimensional data points. A visual representation is created for each data point in the form of a star-shaped glyph. Every variable in the data, or dimension, is assigned to an axis emanating from the origin of the glyph; this is where the ordering of variables has an effect. Typically, this assignment follows the natural order of variables found in the data. Changing the order of variables as they are assigned to the axes will change the shape of the glyph; this effect is explained in more detail in Section 2.1. Clustering is done through a visual inspection of the glyphs atrributes such as shape and size, and a grouping of like glyphs. It is of interest to remove this dependence on order from the star glyph plot, so as to not distort patterns in the data and allow for a more accurate visual clustering. A method to achieve this goal by creating glyphs in which each pair of variables appears adjacently is suggested by Hurley and Oldford 4

5 [1]. Eulerian tours and Hamiltonian decompositions of complete graphs are used to generate such orderings of variables. The resulting ordering is used in the assignment of variables to axes. A star glyph display created in this manner is thought to reduce the order effect, and produce a clustering closer to the true one found in the data. Whether or not such a display is an improvement over a standard star glyph plot is the question investigated in this paper. For quantitative evidence, a study was conducted to assess users performance when working with different types of star glyph plots. Subjects were presented with several glyph displays of the same data set, where the ordering of variables was different for each display. Some glyph displays were produced by a standard assignment of variables to the axes referred to as an ordinary glyph sequence; and other displays were created by ensuring all pairs of variables appear adjacently once. The subjects were asked to cluster each glyph display. The metric used for subjects performance was the distance from the target clustering; this measure is described in Section 2.2. Section 3 outlines the experimental protocol. Experimental results showed that using an ordering generated by an Eulerian tour yields a more reliable visual clustering of the data. The variation around the target clustering was smaller when subjects worked with star glyphs with such an ordering. There were no significant differences in ac- 5

6 curacy, or proximity to the target, between the different types of orderings. Data analysis and conclusions are found in Section 4. 6

7 2 Theory and Methods 2.1 Graph Traversal Algorithms Applied to Star Glyph Displays In an ordinary star glyph display, each variable is assigned a radius emanating from an origin, values are plotted on the radii to determine length, and lines connecting them are drawn to form a polygon. Clustering of star glyphs is done through a visual inspection of their shapes. The shape of a star glyph, however, depends largely on the ordering of the variables assigned to the axes. The same data point can look very different if the ordering of variables is changed, as illustrated in Figure 1. Changes in glyph shapes may in turn lead to a different clustering of the data points. To lessen the order-dependence effect, using a sequence for axes assignment where all pairs of variables appear adjacently is suggested in Hurley and Oldford [1]. Generating such a sequence can be done through graph traversal algorithms [1]. Applying graph theory results is a convenient way to formalize the procedure for glyph construction. The method is as follows. A complete graph is formed, where each variable in the data set is assigned to a node. From the definition of completeness, each pair of nodes in the graph is connected by an edge. An arrangement of all pairings of nodes is obtained by finding an Eulerian tour, or a Hamiltonian decomposition of the graph. An Eulerian tour of a graph is a closed path which visits every edge of 7

8 Figure 1: Star glyph plots for six data points. The plot on the right was created by reordering the variables. the graph exactly once. It is easy to see that such a traversal would generate an ordering on the nodes where each pair appears adjacently. It may be the case that the Eulerian tour is composed of Hamiltonian decompositions of the graph, which is the setup in the second method suggested for constructing glyphs. A Hamiltonian decomposition is a series of edge-distinct Hamiltonian cycles closed paths which visit each node exactly once the union of which is the complete graph. Joining these Hamiltonian cycles at the same node results in an Eulerian tour and produces the desired arrangement of variables. In summary, there are two methods to generate a sequence in which all pairs of variables appear adjacently: by means of an Eulerian tour or a Hamiltonian decomposition of a complete graph. These two sequences, along with the ordinary glyph sequence, produce three different star glyph 8

9 representations of the same data. 2.2 Tree Distances Metric With three different star glyph representations presumably leading to different clusterings, a means of comparing them is necessary. An intuitive way to assess the quality of a clustering is to compare its closeness to the true grouping of the data. Using a tree-based distance measure developed by Oldford and Zhou [2], a distance from one clustering to the target clustering can be obtained. A clustering tree is created for each clustering outcome; an outline of the procedure is summarized below. 1. The entire set of observations to be clustered is assigned to the root node of the tree. 2. The root has at least two branches; its children are mutually exclusive subsets of the data or the clusters found. The distance measure takes two such clustering trees, transforms each of them into a vector, and finds the Euclidean distance between the two vectors. This method is especially effective for hierarchical clustering. Each layer of the tree is a consecutive split of the data into subgroups. This however, is not relevant for the experiment discussed here, since hierarchical visual clustering using star glyphs is a time consuming and labourious procedure. Each clustering tree resulting from an outcome of this experiment had two 9

10 layers: one for the root node, and the second representing the exact clustering indicated by the subject. To test the performance of the three glyph sequences discussed in the previous section an ordinary sequence and the ones produced by an Eulerian tour and a Hamiltonian decomposition the same data set must be clustered three times, once for each sequence. For each clustering, a tree is created and its distance to the true clustering tree is calculated. This produces a set of distances recorded as the response variable in the study. 10

11 3 Experimental Design 3.1 Hypothesis The objective of the study was to compare the effectiveness of three different techniques for arranging star glyph radii. Since the main purpose of star glyph plots is to facilitate visual clustering, naturally their assessment involved human participants who were asked to perform visual clustering tasks on a computer screen. The response measured was the distance between the true clustering of the data and the one produced by the subject. Of interest were the differences in distances using an ordinary glyph sequence versus one produced by an Eulerian tour or a Hamiltonian decomposition. 3.2 Data Sets Participants were presented with star glyph representations of two data sets. These were carefully selected to be reasonably characteristic of data sets one might want to cluster in practice. Many considerations were addressed with regards to suitability of data sets, such as number of observations, dimensionality, and number of clusters. For the purposes of the experiment, the data sets had to be small enough so that subjects could finish the task in a reasonable amount of time, but large enough to mimic the qualities of a data set found in practice. Selected data sets contained points. The optimal dimensionality of the data points was another issue. Star glyphs are considered ineffective for very high-dimensional data; as the number of radii increases, the patterns in glyph shapes become difficult to identify. For small 11

12 dimensions, the ordinary glyph sequence is too similar to one in which all pairs of variables appear adjacently, and the star glyph shapes look almost the same. Six- and seven-dimensional data was used in this study; this was thought to be optimal. Both data sets contained four true clusters of different sizes. Ideally, to have greater confidence in the effect of changing glyph shapes, the subjects would be tested with more than two data sets the more, the better. This kind of experiment, however, is very difficult to set up in practice. Trial runs of the experiment indicated that it took subjects minutes on average, to group six star glyph plots consisting of points. Adding another data set would increase the time required to complete the experiment, and could affect the participants performance. It was assumed that performance would get worse towards the end of the experiment, as participants could become tired and careless with the task. A reasonable expected completion time was chosen as 45 minutes. Based on these considerations, the decision was made to use two data sets in the experiment. Participants first worked with an artificial data set consisting of 50 points, forming 4 clusters in a 7-dimensional space. This data consisted of observations randomly generated from four Gaussian distributions with varying means. One such data set is displayed in Figure 2. The true grouping of the data points corresponded to the four different distributions from which the 12

13 points were generated. The sizes of clusters were assigned randomly. The second data set used was a subset of a real data set giving birth rates, death rates, life expectancies, and Gross National Product for 97 countries in the year The annotated data set, referred to as Poverty data, can be found at A group indicator for each country was given in the file, and is based on geographic location as well as general economic factors (type of economy, first-world, third-world, etc). To meet time constraints, several groups, as well as individual records which contained missing data, were removed from the data set. The total number of observations was reduced to 67, each belonging to one of four groups. The data set contained 6 variables. The star glyph plot of the resulting data is displayed in Figure Protocol The experiment was divided into two parts: a demonstration of the interface, and the participants tasks. It was conducted on a computer interface programmed to facilitate clustering tasks. Participants were first shown a demo of the interface functions using a test data set. After they had had a chance to familiarize themselves with the interface, the actual experiment was begun. The instructions given were to group the star glyphs based on perceived similary and indicate the chosen grouping by brushing each cluster in a unique colour. Participants were shown a total of 6 displays, each having 13

14 Figure 2: Artificial data set. Figure 3: Poverty data set. 14

15 either 50 or 67 star glyphs neatly arranged in a grid pattern. The first 3 were star glyph representations of an artificial data set, the latter 3 of a real data set. For each data set, the order in which participants were shown plots with ordinary, Eulerian tour or Hamiltonian decomposition glyph sequences was randomized. Upon completion of the entire experiment, the partipants received renumeration in the form of a $10 gift certificate. This study was reviewed and received ethics clearance through the Office of Research Ethics at the University of Waterloo. The subjects recruited for this study were undergraduate and graduate students at the University of Waterloo, with a mathematics, science, or engineering background. Almost all had no prior exposure to star glyph plots. A total of 32 students participated in the study. 3.4 Interface The graphical user interface for this experiment was written in R by Adrian Waddell. The program takes a data set, location parameters for the glyphs and a glyph sequence as input, and produces a window displaying the corresponding star glyph plot. The user is able to move the glyphs around in the window, and brush them in different colours. Participants typically arranged 15

16 the glyphs in groups at different corners of the window, then brushed each group in a unique colour. A snapshot of the window can be found in Figure 4 and Figure 5. Figure 4: Experimental interface. 16

17 Figure 5: After clustering. 17

18 4 Results 4.1 Data Description The raw data collected from the experiment consisted of two files for each participant: one file containing the results of clustering the artificial data set, and the other having the results for the real data set clustering. Each file was a data frame where every row corresponded to an observation from the data set, and the columns indicated the colour it was brushed for each of the three glyph sequences. An indicator for the true group that each observation belonged to was also appended to the file. The data processing stage involved calculating the distance from each of the three clustering outcomes to the true clustering of the data. This was done by applying the tree distance methods discussed in Section 2.2. The resulting data set was a list of distance measures, with indicators of the data set that was clustered (real or artificial), glyph sequence used, and subject ID for each record. This formatted data was used for the analysis stage. 4.2 Analysis The experiment was analysed as a randomized block design with two treatment factors and one blocking factor. The two treatment factors were Data set and Glyph sequence ; with the first taking on two levels artificial Gaussian or real Poverty data, and the second with three levels an ordi- 18

19 nary sequence in which variables appear in their natural order, a sequence produced by an Eulerian tour on the complete graph of variables, or one produced by a Hamiltonian decomposition of the same graph. The effect of the Glyph sequence factor was of primary interest, as per study objectives Exploratory Analysis This section outlines some of the results of an initial exploratory analysis of the data. Before imposing a strict model to the data, it was viewed and analysed via graphical tools. One of the goals of an exploratory investigation is outlier detection. A boxplot of the distances grouped by subject is displayed in Figure 6. Subjects 8, 16 and 17 are identified as outliers; due to their lower medians and greater variation in their scores, relative to the distance scores of the other participants. Note that a lower score is better, in the context of this experiment. Since the response variable is distance from the true clustering, a small distance implies that the subject s clustering was very close to the target one. Observations collected from subjects 8, 16 and 17 were removed from the data set used for further analysis, so as not to skew the results. Possible reasons for their exceptional performance may be failure to follow instructions, or prior experience with similar tasks. 19

20 Figure 6: Boxplots of distances grouped by subject. Subjects 8, 16 and 17 are identified as outliers. Next the data is summarized across subjects and grouped by glyph sequence. Boxplots of distances by glyph sequence can be found in Figure 7. This allows for a visual comparison of median and spread for clustering distances obtained by using the three different types of glyph sequence. Inspection of these plots shows only small differences in the medians for the three different glyph sequences; the median distances for the Eulerian and Hamiltonian glyph sequences are slightly lower than that of the ordinary glyph sequence. A larger effect is seen in the spread of the data. The interquartile range for values associated with an ordinary glyph sequence is much 20

21 larger than the range of the other two. In addition, note that the higher end of the inter-quartile range is the same for all glyph sequences, but the lower end is much lower for the ordinary one. This suggests that it is more common to see a smaller distance to the true clustering when using the ordinary glyph sequence. Figure 7: Boxplots of distances grouped by glyph sequence. Another useful graphical summary is a comparison of the distribution of distances across the two data sets. Figure 8 presents two boxplots, both grouped by glyph sequence: one for the artificial Gaussian data, the other for the real Poverty data set. The most noticeable difference between the two 21

22 data sets is the range of values. This however, should be attributed to the nature of the data sets; the artificial data set was randomly generated each time the experiment was conducted, whereas the real data set remained the same throughout. As a result, each participant was presented with a unique artificial data set, and all participants worked with the same real data set. Some of the variation in distance scores with the artificial data sets can be expained by the natural variation between the data sets themselves. Figure 8: Boxplots of distances grouped by glyph sequence for the artifical and real data sets. For the real data set, the results can be displayed in a MDS plot, as shown in Figure 9. An average clustering tree is created for each glyph sequence by combining the clustering outcomes across all subjects. Multidimensional 22

23 Figure 9: An MDS plot of distances between average clustering trees for the real data set. scaling techniques are applied to the distances between the average clustering trees to produce a visualization. The Euclidean distances between the points on the MDS plot are close to the distances between trees, which allows for comparison of the performance of the three glyph sequences. Note that it would not make sense to produce such a plot for the artificial Gaussian data, since the data set, and hence its true clustering was different for each subject. Combining the clustering outcomes across subjects is not appropriate in this case. The points associated with the three ordering methods appear almost equally far from the true clustering in the plot, suggesting that no glyph sequence performs better than the others in terms of accuracy. 23

24 The graphical representations of the experimental data suggest that the impact of glyph sequence on the accuracy of clustering is not large. To validate this assumption, a formal model is fit to the data in the next section Models A standard linear regression model was fit to the experimental data. The response variable was Distance, and the explanatory variables were the blocking factor Subject, and the treatment factors Data Set and Glyph Sequence. No significant interactions between treatment factors were found; and thus no interaction term is included in the model. The model can be written as follows: Y ijk = µ + S i + D j + G k + ɛ ijk where S i = is subject effect, D j = data set effect, and G k = glyph sequence effect. The indices i = 1,.., 29 correspond to the 29 subjects whose results were used in the analysis, j = 1, 2 indexes the data set factor (artificial, Poverty), and k = 1, 2, 3 is associated with the glyph sequence used for the clustering. The results of the model are found in the table below. No treatment effects are found to be significant, as can be seen from the corresponding p-values. This confirms the results suggested by the plots in the exploratory 24

25 analysis section: glyph sequence does not have a significant impact on the accuracy of clustering performance. Effect Df SS MS F-value P-value Subject Data Set Glyphs Residuals Comparing Standard Deviations The boxplots found in Figure 7 and Figure 8 in Section suggest that there are differences in the sample standard deviations for the three different glyph sequences. This is of interest, because a smaller deviation around the target is indicative of a more reliable visual clustering method. To validate this hypothesis, a one-sided F-test was performed for each pair of glyph sequences, at the 95% sigificance level. Results of these tests are displayed in the table below. The standard deviation for Eulerian glyph sequences is found to be smaller than that of ordinary and Hamiltonian sequences; there is no significant difference between ordinary and Hamiltonian glyph sequences. H o H a F-value P-value σ H = σ O σ H < σ O σ E = σ O σ E < σ O σ E = σ H σ E < σ H

26 4.3 Conclusions Based on an exploration of the experimental data through graphical aids, as well as formal analysis of the results, a significant difference in precision, but not in accuracy, was found between the performances of the three different glyph sequences. A glyph sequence produced by an Eulerian tour of a complete graph on all of the variables yields a more precise visual clustering, than one created by a Hamiltonian decomposition of the same graph, or an ordinary glyph sequence where all variables appear once, in their natural order. Although the average distance from the true clustering did not differ across the three glyph sequences, a smaller variation around the true clustering found for Eulerian sequences is a significant improvement on the ordinary method for creating star glyphs. A more reliable technique for visual clustering is one which is less likely to result in clusterings far from the true one that is inherent in the data. Thus, using a glyph sequence obtained by an Eulerian tour results in a gain in precision, and leads to a more reliable visual clustering. It is important to keep in mind that the experiments described in this paper are sensitive to many factors, including test subjects backgrounds and characteristics of the data sets used. Perhaps targeting only subjects with experience in data visualization and previous exposure to star glyph plots would render different results. With regards to the data sets, the difficulty lies in finding those which are appropriate for evaluating clustering methods. 26

27 Clusters in the data are intrinsically somewhat arbitrary it is up to the researcher to define what constitutes a cluster in any given data set. This lack of a clear ground truth against which a technique can be evaluated introduces subjectivity and effects the reliability of results. A way to counter that is to test subjects on a variety of data sets. 27

28 Acknowledgments I would like to thank my supervisor Professor Wayne Oldford for his guidance and support in the writing of this research essay, and Adrian Waddell for his patience and assistance with the programming aspect of this project. 28

29 References [1] Hurley, C.B. and Oldford, R.W (2010). Pairwise Display of High- Dimensional Information via Eulerian Tours and Hamiltonian Decompositions. Journal of Computational and Graphical Statistics, 19, [2] Reference for Tree Distances 29

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable. 5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table

More information

Visualizing high-dimensional data:

Visualizing high-dimensional data: Visualizing high-dimensional data: Applying graph theory to data visualization Wayne Oldford based on joint work with Catherine Hurley (Maynooth, Ireland) Adrian Waddell (Waterloo, Canada) Challenge p

More information

Visual Analytics. Visualizing multivariate data:

Visual Analytics. Visualizing multivariate data: Visual Analytics 1 Visualizing multivariate data: High density time-series plots Scatterplot matrices Parallel coordinate plots Temporal and spectral correlation plots Box plots Wavelets Radar and /or

More information

8. MINITAB COMMANDS WEEK-BY-WEEK

8. MINITAB COMMANDS WEEK-BY-WEEK 8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

Understanding Clustering Supervising the unsupervised

Understanding Clustering Supervising the unsupervised Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data

More information

Use of GeoGebra in teaching about central tendency and spread variability

Use of GeoGebra in teaching about central tendency and spread variability CREAT. MATH. INFORM. 21 (2012), No. 1, 57-64 Online version at http://creative-mathematics.ubm.ro/ Print Edition: ISSN 1584-286X Online Edition: ISSN 1843-441X Use of GeoGebra in teaching about central

More information

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite

More information

Week 7 Picturing Network. Vahe and Bethany

Week 7 Picturing Network. Vahe and Bethany Week 7 Picturing Network Vahe and Bethany Freeman (2005) - Graphic Techniques for Exploring Social Network Data The two main goals of analyzing social network data are identification of cohesive groups

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions

More information

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and

More information

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures

More information

Chapter 2 Describing, Exploring, and Comparing Data

Chapter 2 Describing, Exploring, and Comparing Data Slide 1 Chapter 2 Describing, Exploring, and Comparing Data Slide 2 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data 2-4 Measures of Center 2-5 Measures of Variation 2-6 Measures of Relative

More information

STP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES

STP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES STP 6 ELEMENTARY STATISTICS NOTES PART - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES Chapter covered organizing data into tables, and summarizing data with graphical displays. We will now use

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1 Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group

More information

Chapter 2: The Normal Distribution

Chapter 2: The Normal Distribution Chapter 2: The Normal Distribution 2.1 Density Curves and the Normal Distributions 2.2 Standard Normal Calculations 1 2 Histogram for Strength of Yarn Bobbins 15.60 16.10 16.60 17.10 17.60 18.10 18.60

More information

3. Multidimensional Information Visualization II Concepts for visualizing univariate to hypervariate data

3. Multidimensional Information Visualization II Concepts for visualizing univariate to hypervariate data 3. Multidimensional Information Visualization II Concepts for visualizing univariate to hypervariate data Vorlesung Informationsvisualisierung Prof. Dr. Andreas Butz, WS 2009/10 Konzept und Basis für n:

More information

Unsupervised Learning : Clustering

Unsupervised Learning : Clustering Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex

More information

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity

More information

Chapter 2. Frequency distribution. Summarizing and Graphing Data

Chapter 2. Frequency distribution. Summarizing and Graphing Data Frequency distribution Chapter 2 Summarizing and Graphing Data Shows how data are partitioned among several categories (or classes) by listing the categories along with the number (frequency) of data values

More information

Getting Started with Minitab 17

Getting Started with Minitab 17 2014, 2016 by Minitab Inc. All rights reserved. Minitab, Quality. Analysis. Results. and the Minitab logo are all registered trademarks of Minitab, Inc., in the United States and other countries. See minitab.com/legal/trademarks

More information

CHAPTER 2: SAMPLING AND DATA

CHAPTER 2: SAMPLING AND DATA CHAPTER 2: SAMPLING AND DATA This presentation is based on material and graphs from Open Stax and is copyrighted by Open Stax and Georgia Highlands College. OUTLINE 2.1 Stem-and-Leaf Graphs (Stemplots),

More information

Using Excel for Graphical Analysis of Data

Using Excel for Graphical Analysis of Data Using Excel for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters. Graphs are

More information

Univariate Statistics Summary

Univariate Statistics Summary Further Maths Univariate Statistics Summary Types of Data Data can be classified as categorical or numerical. Categorical data are observations or records that are arranged according to category. For example:

More information

Response to API 1163 and Its Impact on Pipeline Integrity Management

Response to API 1163 and Its Impact on Pipeline Integrity Management ECNDT 2 - Tu.2.7.1 Response to API 3 and Its Impact on Pipeline Integrity Management Munendra S TOMAR, Martin FINGERHUT; RTD Quality Services, USA Abstract. Knowing the accuracy and reliability of ILI

More information

Data Mining: Exploring Data

Data Mining: Exploring Data Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar But we start with a brief discussion of the Friedman article and the relationship between Data

More information

Averages and Variation

Averages and Variation Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus

More information

Graphical Analysis of Data using Microsoft Excel [2016 Version]

Graphical Analysis of Data using Microsoft Excel [2016 Version] Graphical Analysis of Data using Microsoft Excel [2016 Version] Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters.

More information

Downloaded from

Downloaded from UNIT 2 WHAT IS STATISTICS? Researchers deal with a large amount of data and have to draw dependable conclusions on the basis of data collected for the purpose. Statistics help the researchers in making

More information

IQC monitoring in laboratory networks

IQC monitoring in laboratory networks IQC for Networked Analysers Background and instructions for use IQC monitoring in laboratory networks Modern Laboratories continue to produce large quantities of internal quality control data (IQC) despite

More information

2. Navigating high-dimensional spaces and the RnavGraph R package

2. Navigating high-dimensional spaces and the RnavGraph R package Graph theoretic methods for Data Visualization: 2. Navigating high-dimensional spaces and the RnavGraph R package Wayne Oldford based on joint work with Adrian Waddell and Catherine Hurley Tutorial B2

More information

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order. Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good

More information

Getting Started with Minitab 18

Getting Started with Minitab 18 2017 by Minitab Inc. All rights reserved. Minitab, Quality. Analysis. Results. and the Minitab logo are registered trademarks of Minitab, Inc., in the United States and other countries. Additional trademarks

More information

Machine Learning Methods in Visualisation for Big Data 2018

Machine Learning Methods in Visualisation for Big Data 2018 Machine Learning Methods in Visualisation for Big Data 2018 Daniel Archambault1 Ian Nabney2 Jaakko Peltonen3 1 Swansea University 2 University of Bristol 3 University of Tampere, Aalto University Evaluating

More information

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Lecture 3 Questions that we should be able to answer by the end of this lecture: Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair

More information

Parallel Coordinate Plots

Parallel Coordinate Plots Page 1 of 6 Parallel Coordinate Plots Parallel coordinates plots are a technique used for representing high dimensional data. To show a set of points in an n dimensional space onto a 2D surface (the computer

More information

BUSINESS DECISION MAKING. Topic 1 Introduction to Statistical Thinking and Business Decision Making Process; Data Collection and Presentation

BUSINESS DECISION MAKING. Topic 1 Introduction to Statistical Thinking and Business Decision Making Process; Data Collection and Presentation BUSINESS DECISION MAKING Topic 1 Introduction to Statistical Thinking and Business Decision Making Process; Data Collection and Presentation (Chap 1 The Nature of Probability and Statistics) (Chap 2 Frequency

More information

Meet MINITAB. Student Release 14. for Windows

Meet MINITAB. Student Release 14. for Windows Meet MINITAB Student Release 14 for Windows 2003, 2004 by Minitab Inc. All rights reserved. MINITAB and the MINITAB logo are registered trademarks of Minitab Inc. All other marks referenced remain the

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

Chapter 3 - Displaying and Summarizing Quantitative Data

Chapter 3 - Displaying and Summarizing Quantitative Data Chapter 3 - Displaying and Summarizing Quantitative Data 3.1 Graphs for Quantitative Data (LABEL GRAPHS) August 25, 2014 Histogram (p. 44) - Graph that uses bars to represent different frequencies or relative

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Lecture 3 Questions that we should be able to answer by the end of this lecture: Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair

More information

Chapter 3. Descriptive Measures. Slide 3-2. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Chapter 3. Descriptive Measures. Slide 3-2. Copyright 2012, 2008, 2005 Pearson Education, Inc. Chapter 3 Descriptive Measures Slide 3-2 Section 3.1 Measures of Center Slide 3-3 Definition 3.1 Mean of a Data Set The mean of a data set is the sum of the observations divided by the number of observations.

More information

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 Objectives 2.1 What Are the Types of Data? www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

CHAPTER 2 Modeling Distributions of Data

CHAPTER 2 Modeling Distributions of Data CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Density Curves

More information

Graph projection techniques for Self-Organizing Maps

Graph projection techniques for Self-Organizing Maps Graph projection techniques for Self-Organizing Maps Georg Pölzlbauer 1, Andreas Rauber 1, Michael Dittenbach 2 1- Vienna University of Technology - Department of Software Technology Favoritenstr. 9 11

More information

The Projected Dip-means Clustering Algorithm

The Projected Dip-means Clustering Algorithm Theofilos Chamalis Department of Computer Science & Engineering University of Ioannina GR 45110, Ioannina, Greece thchama@cs.uoi.gr ABSTRACT One of the major research issues in data clustering concerns

More information

Cluster Analysis. Ying Shen, SSE, Tongji University

Cluster Analysis. Ying Shen, SSE, Tongji University Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group

More information

Table Of Contents. Table Of Contents

Table Of Contents. Table Of Contents Statistics Table Of Contents Table Of Contents Basic Statistics... 7 Basic Statistics Overview... 7 Descriptive Statistics Available for Display or Storage... 8 Display Descriptive Statistics... 9 Store

More information

An Empirical Analysis of Communities in Real-World Networks

An Empirical Analysis of Communities in Real-World Networks An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization

More information

NAME: DIRECTIONS FOR THE ROUGH DRAFT OF THE BOX-AND WHISKER PLOT

NAME: DIRECTIONS FOR THE ROUGH DRAFT OF THE BOX-AND WHISKER PLOT NAME: DIRECTIONS FOR THE ROUGH DRAFT OF THE BOX-AND WHISKER PLOT 1.) Put the numbers in numerical order from the least to the greatest on the line segments. 2.) Find the median. Since the data set has

More information

Using Statistical Techniques to Improve the QC Process of Swell Noise Filtering

Using Statistical Techniques to Improve the QC Process of Swell Noise Filtering Using Statistical Techniques to Improve the QC Process of Swell Noise Filtering A. Spanos* (Petroleum Geo-Services) & M. Bekara (PGS - Petroleum Geo- Services) SUMMARY The current approach for the quality

More information

Chapter 2 Modeling Distributions of Data

Chapter 2 Modeling Distributions of Data Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and

More information

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or  me, I will answer promptly. Statistical Methods Instructor: Lingsong Zhang 1 Issues before Class Statistical Methods Lingsong Zhang Office: Math 544 Email: lingsong@purdue.edu Phone: 765-494-7913 Office Hour: Monday 1:00 pm - 2:00

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Data Mining: Exploring Data. Lecture Notes for Chapter 3 Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include

More information

Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) *

Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) * OpenStax-CNX module: m39305 1 Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) * Free High School Science Texts Project This work is produced by OpenStax-CNX

More information

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

STATS306B STATS306B. Clustering. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010 STATS306B Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Outline K-means, K-medoids, EM algorithm choosing number of clusters: Gap test hierarchical clustering spectral

More information

2010 by Minitab, Inc. All rights reserved. Release Minitab, the Minitab logo, Quality Companion by Minitab and Quality Trainer by Minitab are

2010 by Minitab, Inc. All rights reserved. Release Minitab, the Minitab logo, Quality Companion by Minitab and Quality Trainer by Minitab are 2010 by Minitab, Inc. All rights reserved. Release 16.1.0 Minitab, the Minitab logo, Quality Companion by Minitab and Quality Trainer by Minitab are registered trademarks of Minitab, Inc. in the United

More information

AND NUMERICAL SUMMARIES. Chapter 2

AND NUMERICAL SUMMARIES. Chapter 2 EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 What Are the Types of Data? 2.1 Objectives www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned

More information

MAGMA joint modelling options and QC read-me (v1.07a)

MAGMA joint modelling options and QC read-me (v1.07a) MAGMA joint modelling options and QC read-me (v1.07a) This document provides a brief overview of the (application of) the different options for conditional, joint and interaction analysis added in version

More information

Hierarchical Representation of 2-D Shapes using Convex Polygons: a Contour-Based Approach

Hierarchical Representation of 2-D Shapes using Convex Polygons: a Contour-Based Approach Hierarchical Representation of 2-D Shapes using Convex Polygons: a Contour-Based Approach O. El Badawy, M. S. Kamel Pattern Analysis and Machine Intelligence Laboratory, Department of Systems Design Engineering,

More information

University of Florida CISE department Gator Engineering. Clustering Part 4

University of Florida CISE department Gator Engineering. Clustering Part 4 Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

Chapter 2: The Normal Distributions

Chapter 2: The Normal Distributions Chapter 2: The Normal Distributions Measures of Relative Standing & Density Curves Z-scores (Measures of Relative Standing) Suppose there is one spot left in the University of Michigan class of 2014 and

More information

VCEasy VISUAL FURTHER MATHS. Overview

VCEasy VISUAL FURTHER MATHS. Overview VCEasy VISUAL FURTHER MATHS Overview This booklet is a visual overview of the knowledge required for the VCE Year 12 Further Maths examination.! This booklet does not replace any existing resources that

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

Inital Starting Point Analysis for K-Means Clustering: A Case Study

Inital Starting Point Analysis for K-Means Clustering: A Case Study lemson University TigerPrints Publications School of omputing 3-26 Inital Starting Point Analysis for K-Means lustering: A ase Study Amy Apon lemson University, aapon@clemson.edu Frank Robinson Vanderbilt

More information

Chapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Chapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc. Chapter 5 Understanding and Comparing Distributions The Big Picture We can answer much more interesting questions about variables when we compare distributions for different groups. Below is a histogram

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.

More information

Descriptive Statistics, Standard Deviation and Standard Error

Descriptive Statistics, Standard Deviation and Standard Error AP Biology Calculations: Descriptive Statistics, Standard Deviation and Standard Error SBI4UP The Scientific Method & Experimental Design Scientific method is used to explore observations and answer questions.

More information

Exploratory Data Analysis EDA

Exploratory Data Analysis EDA Exploratory Data Analysis EDA Luc Anselin http://spatial.uchicago.edu 1 from EDA to ESDA dynamic graphics primer on multivariate EDA interpretation and limitations 2 From EDA to ESDA 3 Exploratory Data

More information

Chapter 2: Descriptive Statistics

Chapter 2: Descriptive Statistics Chapter 2: Descriptive Statistics Student Learning Outcomes By the end of this chapter, you should be able to: Display data graphically and interpret graphs: stemplots, histograms and boxplots. Recognize,

More information

Clustering Part 4 DBSCAN

Clustering Part 4 DBSCAN Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Data Exploration Chapter Introduction to Data Mining by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 What is data exploration?

More information

Understanding and Comparing Distributions. Chapter 4

Understanding and Comparing Distributions. Chapter 4 Understanding and Comparing Distributions Chapter 4 Objectives: Boxplot Calculate Outliers Comparing Distributions Timeplot The Big Picture We can answer much more interesting questions about variables

More information

The convex hull of a set Q of points is the smallest convex polygon P for which each point in Q is either on the boundary of P or in its interior.

The convex hull of a set Q of points is the smallest convex polygon P for which each point in Q is either on the boundary of P or in its interior. CS 312, Winter 2007 Project #1: Convex Hull Due Dates: See class schedule Overview: In this project, you will implement a divide and conquer algorithm for finding the convex hull of a set of points and

More information

Chapter 5: The beast of bias

Chapter 5: The beast of bias Chapter 5: The beast of bias Self-test answers SELF-TEST Compute the mean and sum of squared error for the new data set. First we need to compute the mean: + 3 + + 3 + 2 5 9 5 3. Then the sum of squared

More information

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys Unit 7 Statistics AFM Mrs. Valentine 7.1 Samples and Surveys v Obj.: I will understand the different methods of sampling and studying data. I will be able to determine the type used in an example, and

More information

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to

More information

Data analysis using Microsoft Excel

Data analysis using Microsoft Excel Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data

More information

Practical 2: Using Minitab (not assessed, for practice only!)

Practical 2: Using Minitab (not assessed, for practice only!) Practical 2: Using Minitab (not assessed, for practice only!) Instructions 1. Read through the instructions below for Accessing Minitab. 2. Work through all of the exercises on this handout. If you need

More information

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core) Introduction to Data Science What is Analytics and Data Science? Overview of Data Science and Analytics Why Analytics is is becoming popular now? Application of Analytics in business Analytics Vs Data

More information

MATH 112 Section 7.2: Measuring Distribution, Center, and Spread

MATH 112 Section 7.2: Measuring Distribution, Center, and Spread MATH 112 Section 7.2: Measuring Distribution, Center, and Spread Prof. Jonathan Duncan Walla Walla College Fall Quarter, 2006 Outline 1 Measures of Center The Arithmetic Mean The Geometric Mean The Median

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

Basic Statistical Terms and Definitions

Basic Statistical Terms and Definitions I. Basics Basic Statistical Terms and Definitions Statistics is a collection of methods for planning experiments, and obtaining data. The data is then organized and summarized so that professionals can

More information

A Multiple-Line Fitting Algorithm Without Initialization Yan Guo

A Multiple-Line Fitting Algorithm Without Initialization Yan Guo A Multiple-Line Fitting Algorithm Without Initialization Yan Guo Abstract: The commonest way to fit multiple lines is to use methods incorporate the EM algorithm. However, the EM algorithm dose not guarantee

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 245 Introduction This procedure generates R control charts for variables. The format of the control charts is fully customizable. The data for the subgroups can be in a single column or in multiple

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

8: Statistics. Populations and Samples. Histograms and Frequency Polygons. Page 1 of 10

8: Statistics. Populations and Samples. Histograms and Frequency Polygons. Page 1 of 10 8: Statistics Statistics: Method of collecting, organizing, analyzing, and interpreting data, as well as drawing conclusions based on the data. Methodology is divided into two main areas. Descriptive Statistics:

More information

SSC-Stat Tutorial. by Roger Stern, Sandro Leidi and Colin Grayer. Contents

SSC-Stat Tutorial. by Roger Stern, Sandro Leidi and Colin Grayer. Contents by Roger Stern, Sandro Leidi and Colin Grayer Contents 1. Introduction 1 1.1. Data types 1 1.2. Sections of this guide 1 2. The data area 2 3. Simple descriptive methods 3 3.1. Analysing yields 4 3.2.

More information

MODELING FOR RESIDUAL STRESS, SURFACE ROUGHNESS AND TOOL WEAR USING AN ADAPTIVE NEURO FUZZY INFERENCE SYSTEM

MODELING FOR RESIDUAL STRESS, SURFACE ROUGHNESS AND TOOL WEAR USING AN ADAPTIVE NEURO FUZZY INFERENCE SYSTEM CHAPTER-7 MODELING FOR RESIDUAL STRESS, SURFACE ROUGHNESS AND TOOL WEAR USING AN ADAPTIVE NEURO FUZZY INFERENCE SYSTEM 7.1 Introduction To improve the overall efficiency of turning, it is necessary to

More information

Fundamental Properties of Graphs

Fundamental Properties of Graphs Chapter three In many real-life situations we need to know how robust a graph that represents a certain network is, how edges or vertices can be removed without completely destroying the overall connectivity,

More information

Lofting 3D Shapes. Abstract

Lofting 3D Shapes. Abstract Lofting 3D Shapes Robby Prescott Department of Computer Science University of Wisconsin Eau Claire Eau Claire, Wisconsin 54701 robprescott715@gmail.com Chris Johnson Department of Computer Science University

More information

MATH11400 Statistics Homepage

MATH11400 Statistics Homepage MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 1.1 A Framework for Statistical Problems Many statistical problems can be described by a simple framework in which

More information