Distance Based Methods. Lecture 13 August 9, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Size: px

Start display at page:

Download "Distance Based Methods. Lecture 13 August 9, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2"

Aileen Boone
5 years ago
Views:

1 Distance Based Methods Lecture 13 August 9, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

2 Today s Lecture Measures of distance (similarity and dissimilarity) Multivariate methods that use distances (dissimilarities) and similarities Hierarchical clustering algorithms Multidimensional scaling K-means clustering

3 Clustering Definitions Clustering objects (or observations) can provide detail regarding the nature and structure of data Clustering is distinct from classification in terminology Classification pertains to a known number of groups, with the objective being to assign new observations to these groups Methods of cluster analysis is a technique where no assumptions are made concerning the number of groups or the group structure

4 Magnitude of the Problem Clustering algorithms make use of measures of similarity (or alternatively, dissimilarity) to define and group variables or observations Clustering presents a host of technical problems For a reasonable sized data set with n objects (either variables or individuals), the number of ways of grouping n objects into k groups is: 1 k! k j 0 1 j k For example, there are over four trillion ways that 25 objects can be clustered into 4 groups which solution is best? k j j n

5 Numerical Problems In theory, one way to find the best solution is to try each possible grouping of all of the objects an optimization process called integer programming It is difficult, if not impossible, to do such a method given the state of today s computers (although computers are catching up with such problems) Rather than using such brute-force type methods, a set of heuristics have been developed to allow for fast clustering of objects in to groups Such methods are called heuristics because they do not guarantee that the solution will be optimal (best), only that the solution will be better than most

6 Clustering Heuristic Inputs The inputs into clustering heuristics are in the form of measures of similarities or dissimilarities The result of the heuristic depends in large part on the measure of similarity/dissimilarity used by the procedure Because of this, we will start this lecture by examining measures of distance (used synonymously with similarity/dissimilarity)

7 Measures of Distance Care must be taken with choosing the metric by which similarity is quantified Important considerations include: The nature of the variables (e.g., discrete, continuous, binary) Scales of measurement (nominal, ordinal, interval, or ratio) The nature of the matter under study

8 Clustering Variables vs. Clustering Observations When variables are to be clustered, oft used measures of similarity include correlation coefficients (or similar measures for noncontinuous variables) When observations are to be clustered, distance metrics are often used

9 Euclidean Distance Euclidean distance is a frequent choice of a distance metric: ) ( ) ( ) ( ) ( ) ( ), ( y x y x y x p x p y y x y x d This distance is often chosen because it represents an understandable metric But sometimes, it may not be best choice

Sacramento to Lombard Street in San Francisco Knowing how far it was

10 Euclidean Distance? Imagine I wanted to know how many miles it was from my old house in Sacramento to Lombard Street in San Francisco Knowing how far it was on a straight line would not do me too much good, particularly with the number of one-way streets that exist in San Francisco

11 Other Distance Metrics Other popular distance metrics include the Minkowski metric: d p 1/ m m ( x, y) i 1 x i y i The key to this metric is the choice of m: If m = 2, this provides the Euclidean distance If m = 1, this provides the city-block distance

12 Preferred Distance Properties It is often desirable to use a distance metric that meets the following properties: d(p,q) = d(q,p) d(p,q) > 0 if P Q d(p,q) = 0 if P = Q d(p,q) d(p,r) + d(r,q) The Euclidean and Minkowski metrics satisfy these properties

13 Triangle Inequality The fourth property is called the triangle inequality, which often gets violated by non-routine measures of distance This inequality can be shown by the following triangle (with lines representing Euclidean distances): P d(p,q) Q d(p,r) R d(r,q) d(p,r) d(r,q) d(p,q)

14 Binary Variables In the case of binary-valued variables (variables that have a 0/1 coding), many other distance metrics may be defined The Euclidean distance provides a count of the number of mismatched observations: Here, d(i,k) = 2 This is sometimes called the Hamming Distance

15 Other Binary Distance Measures There are a number of other ways to define the distance between a set of binary variables Most of these measures reflect the varied importance placed on differing cells in a 2 x 2 table

16 General Distance Measure Properties Use of measures of distance that are monotonic in their ordering of object distances will provide identical results from clustering heuristics Many times this will only be an issue if the distance measure is for binary variables or the distance measure does not satisfy the triangle inequality

17 Hierarchical Clustering Methods Because of the large size of possible clustering solutions, searching through all combinations is not feasible Hierarchical clustering techniques proceed by taking a set of objects and grouping a set at a time Two types of hierarchical clustering methods exist: Agglomerative hierarchical methods Divisive hierarchical methods

18 Agglomerative Clustering Methods Agglomerative clustering methods start first with the individual objects Initially, each object is it s own cluster The most similar objects are then grouped together into a single cluster (with two objects) We will find that what we mean by similar will change depending on the method

19 Agglomerative Clustering Methods The remaining steps involve merging the clusters according the similarity or dissimilarity of the objects within the cluster to those outside of the cluster The method concludes when all objects are part of a single cluster

20 Divisive Clustering Methods Divisive hierarchical methods works in the opposite direction beginning with a single, n-object sized cluster The large cluster is then divided into two subgroups where the objects in opposing groups are relatively distant from each other

21 Divisive Clustering Methods The process continues similarly until there are as many clusters as there are objects While it is briefly discussed in the chapter, we will not discuss divisive clustering methods in detail here in this class

22 To Summarize So you can see that we have this idea of steps At each step two clusters combine to form one (Agglomerative) OR At each step a cluster is divided into two new clusters (Divisive)

23 Methods for Viewing Clusters As you could imagine, when we consider the methods for hierarchical clustering, there are a large number of clusters that are formed sequentially One of the most frequently used tools to view the clusters (and level at which they were formed) is the dendrogram A dendrogram is a graph that describes the differing hierarchical clusters, and the distance at which each is formed

24 Example Data Set #1 To demonstrate several of the hierarchical clustering methods, an example data set is used Data come from a 1991 study by the economic research department of the union bank of Switzerland representing economic conditions of 48 cities around the world Three variables were collected: Average working hours for 12 occupations Price of 112 goods and services excluding rent Index of net hourly earnings in 12 occupations

25 The Cities For example, Here Amsterdam and Brussels Example Of Dendogram were combined to form a group This is an example of an agglomerate cluster where cities start off in there own group and then are combined Where the lines connect is when those two previous groups were joined Notice that we do not specify groups, but if we know how many we want we simply go to the step where there are that many groups

26 Similarity? So, we mentioned that: The most similar objects are then grouped together into a single cluster (with two objects) So the next question is how do we measure similarity between clusters More specifically, how do we redefine it when a cluster contains a combination of old clusters We find that there are several ways to define similar and each way defines a new method of clustering

27 Agglomerative Methods Next we discuss several different way to complete Agglomerative hierarchical clustering: Single Linkage Complete Linkage Average Linkage Centroid Median Ward Method

28 Agglomerative Methods In describing these I will give the definition of how we define similar when clusters are combined. And for the first two I will give detailed examples.

29 Example Distance Matrix The example will be based on the distance matrix below

30 Single Linkage The single linkage method of clustering involves combining clusters by finding the nearest neighbor the cluster closest to any given observation within the current cluster Notice that this is equal to the minimum distance of any observation in cluster one with any observation in cluster 3 Cluster 1 Cluster

31 Single Linkage So the distance between any two clusters is: D( A, B) min d( yi, y j ) Notice any other distance is longer For all y i in A and y j in B So how would we do this using our distance matrix? 2 5

32 Single Linkage Example The first step in the process is to determine the two elements with the smallest distance, and combine them into a single cluster. Here, the two objects that are most similar are objects 3 and 5 we will now combine these into a new cluster, and compute the distance from that cluster to the remaining clusters (objects) via the single linkage rule

33 Single Linkage Example The shaded rows/columns are the portions of the table with These are the distances These the from are distances the an distances of 3 and object of 3 and 5 with 51. with Our 2. rule Our says rule that says the that the to object distance 3 of or our object new 5. distance of our cluster new cluster with with The distance 1 equal 2 is from to equal the the minimum to the (35) minimum of of cluster to theses two remaining theses values 3 two values 7 objects is given below: d (35),1 = min{d 31,d 51 } = min{3,11} = 3 d (35),2 = min{d 32,d 52 } = min{7,10} = 7 d (35),4 = min{d 34,d 54 } = min{9,8} = In equation form our new distances are

34 Single Linkage Example Using the distance values, we now consolidate our table so that (35) is now a single row/column The distance from the (35) cluster to the remaining objects is given below: (35) (35) d (35)1 = min{d 31,d 51 } = min{3,11} = 3 d (35)2 = min{d 32,d 52 } = min{7,10} = 7 d (35)4 = min{d 34,d 54 } = min{9,8} =

35 Single Linkage Example We now repeat the process, by finding the smallest distance between within the set of remaining clusters The smallest distance is between object 1 and cluster (35) Therefore, object 1 joins cluster (35), creating cluster (135) (35) (35) The distance from cluster (135) to the other clusters is then computed: d(135)2 = min{d(35)2,d12} = min{7,9} = 7 d(135)4 = min{d(35)4,d14} = min{8,6} = 6

36 Single Linkage Example Using the distance values, we now consolidate our table so that (135) is now a single row/column The distance from the (135) cluster to the remaining objects is given below: (135) 2 4 (135) d(135)2 = min{d(35)2,d12} = min{7,9} = 7 d(135)4 = min{d(35)4,d14} = min{8,6} = 6

37 Single Linkage Example We now repeat the process, by finding the smallest distance between within the set of remaining clusters The smallest distance is between object 2 and object 4 These two objects will be joined to from cluster (24) The distance from (24) to (135) is then computed (135) 2 4 (135) d (135)(24) = min{d (135)2,d (135)4 } = min{7,6} = 6 The final cluster is formed (12345) with a distance of 6

38 The Dendogram For example, here is where 3 and 5 joined

39 Complete Linkage The complete linkage method of clustering involves combining clusters by finding the farthest neighbor the cluster farthest to any given observation within the current cluster This ensures that all objects in a cluster are within some maximum distance of each other Cluster 1 Cluster

40 Complete Linkage So the distance between any two clusters is: D( A, B) max d( yi, y j ) Notice any other distance is shorter For all y i in A and y j in B So how would we do this using our distance matrix? 2 5

41 Complete Linkage Example To demonstrate complete linkage in action, consider the five-object distance matrix The first step in the process is to determine the two elements with the smallest distance, and combine them into a single cluster Here, the two objects that are most similar are objects 3 and 5 We will now combine these into a new cluster, and compute the distance from that cluster to the remaining clusters (objects) via the complete linkage rule

42 Complete Linkage Example The shaded rows/columns are the portions of the table with the distances from an object to object 3 or object 5 The distance from the (35) cluster to the remaining objects is given below: d (35)1 = max{d 31,d 51 } = max{3,11} = 11 d (35)2 = max{d 32,d 52 } = max{7,10} = 10 d (35)4 = max{d 34,d 54 } = max{9,8} =

43 Complete Linkage Example Using the distance values, we now consolidate our table so that (35) is now a single row/column The distance from the (35) cluster to the remaining objects is given below: d (35)1 = max{d 31,d 51 } = max{3,11} = 11 d (35)2 = max{d 32,d 52 } = max{7,10} = 10 d (35)4 = max{d 34,d 54 } = max{9,8} = 9 (35) (35) Notice our new computed distances with (35)

44 Complete Linkage Example We now repeat the process, by finding the smallest distance between within the set of remaining clusters The smallest distance is between object 2 and object 4. Therefore, they form cluster (24) The distance from cluster (24) to the other clusters is then computed: (35) (35) d (24)(135) = max{d 2(35),d 4(35) } = max{10,9} = 10 d (24)1 = max{d 21,d 41 } = max{9,6} = 9 So now we use our rule to combine 2 and 4

45 Complete Linkage Example Using the distance values, we now consolidate our table so that (24) is now a single row/column (35) (24) 1 (35) The distance from the (24) cluster to the remaining objects is given below: (24) d (24)(135) = max{d 2(35),d 4(35) } = max{10,9} = 10 d (24)1 = max{d 21,d 41 } = max{9,6} = 9 Notice our 10 and 9

46 Complete Linkage Example We now repeat the process, by finding the smallest distance between within the set of remaining clusters The smallest distance is between cluster (24) and object 1 These two objects will be joined to from cluster (124) The distance from (124) to (35) is then computed d (35)(124) = max{d 1(35),d (24)(35) } = max{11,10} = 11 (35) (24) 1 (35) 0 10 (24) The final cluster is formed (12345) with a distance of 11

47 The Dendogram

48 Average Linkage The average linkage method proceeds similarly to the single and complete linkage methods, with the exception that at the end of each agglomeration step, the distance between clusters is now represented by the average distance of all objects within each cluster In reality, the average linkage method will produce very similar results to the complete linkage method

49 So To summarize we have explained three of the methods to combine groups Notice that once things are in the same group they cannot be separated Also, as before with factor analysis the method you use is largely up to you

50 Example Distance Matrix To further demonstrate the agglomerative methods, imagine we had the following distance matrix for five objects:

51 Linkage Methods for Agglomerative Clustering We will discuss three techniques for agglomerative clustering: 1. Single linkage 2. Complete linkage 3. Average linkage Each of these techniques provides an assignment rule that determines when an observation becomes part of a cluster

52 Cluster Analysis in SAS To perform hierarchical clustering in SAS, proc cluster is used Prior to using proc cluster, the data must be converted from the raw observations into a distance matrix SAS provides a built-in macro to create distance matrices that are then used as input into the proc cluster routine To produce dendrograms in SAS, proc tree is used

53 Distance Macro Prior to specifying the distance macro, a pair of %include statements must be placed (preferably at the beginning of the syntax file): %include 'C:\Program Files\SAS Institute\SAS\V8\stat\sample\xmacro.sas'; %include 'C:\Program Files\SAS Institute\SAS\V8\stat\sample\distnew.sas'; The generic syntax for the distance-creating macro is to use %distance() all arguments are optional %distance(data=cities, id=city, options=nomiss, out=distcity, shape=square, method=euclid, var=work--salary); For more information on the distance macro, go to:

54 Distance Macro Example %include 'C:\Program Files\SAS Institute\SAS\V8\stat\sample\xmacro.sas'; %include 'C:\Program Files\SAS Institute\SAS\V8\stat\sample\distnew.sas'; title '1991 City Data'; data cities; input city $ work price salary; cards; Amsterdam Athens ; %distance(data=cities, id=city, options=nomiss, out=distcity, shape=square, method=euclid, var=work--salary);

55 Proc Cluster The proc cluster guide can be found at: ter_toc.htm The generic syntax is: proc cluster data=dataset method=single; id idvariable; run;

56 Same Example in SAS *SAS Example #1; %include 'C:\Program Files\SAS Institute\SAS\V8\stat\sample\xmacro.sas'; %include 'C:\Program Files\SAS Institute\SAS\V8\stat\sample\distnew.sas'; title 'small distance matrix'; data dist(type=distance); input object1-object5 name $; cards; object object object object object5 ; run; The CLUSTER Procedure Single Linkage Cluster Analysis Mean Distance Between Observations = 7 Cluster History Norm T Min i NCL --Clusters Joined--- FREQ Dist e 4 object3 object object1 CL object2 object CL3 CL title 'Single Link'; proc cluster data=dist method=single nonorm; id name; run; proc tree horizontal spaces=2; id name; run;

57 Dendrogram for Single Link Clustering Method

58 Complete Linkage Example The shaded rows/columns are the portions of the table with the distances from an object to object 3 or object 5. The distance from the (35) cluster to the remaining objects is given below: d (35)1 = max{d 31,d 51 } = max{3,11} = 11 d (35)2 = max{d 32,d 52 } = max{7,10} = 10 d (35)4 = max{d 34,d 54 } = max{9,8} =

59 Same Example in SAS title 'Complete Link'; proc cluster data=dist method=complete nonorm; id name; run; proc tree horizontal spaces=2; id name; run; The CLUSTER Procedure Complete Linkage Cluster Analysis Cluster History T Max i NCL --Clusters Joined--- FREQ Dist e 4 object3 object object2 object object1 CL CL2 CL4 5 11

60 Dendrogram for Complete Link Clustering Method

61 Average Linkage The average linkage method proceeds similarly to the single and complete linkage methods, with the exception that at the end of each agglomeration step, the distance between clusters is now represented by average distance of all objects within each cluster In reality, the average linkage method will produce very similar results to the complete linkage method In the interests of time, I will forgo the gory details and get right into the SAS code

62 Average Link Example in SAS title 'Average Link'; proc cluster data=dist method=average nonorm; id name; run; proc tree horizontal spaces=2; id name; run; The CLUSTER Procedure Average Linkage Cluster Analysis Cluster History T RMS i NCL --Clusters Joined--- FREQ Dist e 4 object3 object object2 object object1 CL CL2 CL

63 Dendrogram for Complete Link Clustering Method

64 Clustering Cutoffs In hierarchical clustering, the final decision is to decide just what distance to use to evaluate the final set of clusters Many people choose a spot based on one of two criteria: 1. There is a large gap between agglomeration steps 2. The clusters represent sets of objects that are consistent with preconceived theories To demonstrate, let s pull out the economic data from the cities of the world

65 City Data Example For this data set, we will use the average linkage method of agglomeration. This corresponds to SAS Example #2 in the code %distance(data=cities, id=city, options=nomiss, out=distcity, shape=square, method=euclid, var=work--salary); proc cluster data=distcity method=average; id city; run; proc tree horizontal spaces=2; id city; run;

66 City Data Example The CLUSTER Procedure Average Linkage Cluster Analysis Root-Mean-Square Distance Between Observations = Cluster History Norm T RMS i NCL --Clusters Joined--- FREQ Dist e 45 Caracas Singpore London Paris Milan Vienna Amsterda Brussels Lisbon Rio_de_J Mexico_C Nairobi Copenhag Madrid Geneva Zurich Bogota Kuala_Lu Frankfur Sydney Nicosia Seoul Dublin CL Chicago New_York Johannes CL

67 City Data Dendrogram

68 K-means Clustering Another procedure used to cluster objects is the k-means clustering method K-means initially (and randomly) assigns objects to a total of k clusters The method then goes through each object and reassigns the object to the cluster that has the smallest distance The mean (or centroid) of the cluster is then updated to reflect the new observation This process is repeated until very little change in the clusters occurs

69 K-means Clustering K-means allows for the user to not have to worry about what distance value to use to set a final criterion for a cut-off, as in hierarchical clustering K-means is sensitive to local optima Solutions will depend upon starting values There is a need to re-run this method multiple times to find the best solution

70 K-means Example (SAS Example #3) To do K-means clustering in SAS, the fastclus procedure is used. The guide for the fastclus procedure can be found at: astclus_sect016.htm We will attempt to use proc fastclus to cluster our city data set into three clusters

71 K-means Example *SAS Example #3 - K-means with Cities; proc fastclus data=cities maxc=3 maxiter=1000 out=clus; var work price salary; run; proc sort data=clus; by cluster; run; proc print data=clus; var city cluster; run;

72 1991 City Data 22:13 Saturday, November 27, The FASTCLUS Procedure Replace=FULL Radius=0 Maxclusters=3 Maxiter=1000 Converge=0.02 Initial Seeds Cluster work price salary ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Minimum Distance Between Initial Seeds = Iteration History Relative Change in Cluster Seeds Iteration Criterion ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Convergence criterion is satisfied. Criterion Based on Final Seeds = Cluster Summary Maximum Distance RMS Std from Seed Radius Nearest Distance Between Cluster Frequency Deviation to Observation Exceeded Cluster Cluster Centroids ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

73 Statistics for Variables Variable Total STD Within STD R-Square RSQ/(1-RSQ) ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ work price salary OVER-ALL Pseudo F Statistic = The FASTCLUS Procedure Replace=FULL Radius=0 Maxclusters=3 Maxiter=1000 Converge=0.02 Approximate Expected Over-All R-Squared = Cubic Clustering Criterion = WARNING: The two values above are invalid for correlated variables. Cluster Means Cluster work price salary ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Cluster Standard Deviations Cluster work price salary ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

74 1 Amsterda 1 2 Athens 1 3 Brussels 1 4 Copenhag 1 5 Dublin 1 6 Dusseldo 1 7 Frankfur 1 8 Helsinki 1 9 Lagos 1 10 Lisbon 1 11 London 1 12 Luxembou 1 13 Madrid 1 14 Milan 1 15 Montreal 1 16 Nicosia 1 17 Oslo 1 18 Paris 1 19 Rio_de_J 1 20 Seoul 1 21 Stockhol 1 22 Sydney 1 23 Vienna 1 24 Bombay 2 25 Buenos_A 2 26 Caracas 2 27 Chicago 2 28 Geneva 2 29 Houston 2 30 Johannes 2 31 Los_Ange 2 32 Mexico_C 2 33 Nairobi 2 34 New_York 2 35 Panama 2 36 Sao_Paul 2 37 Singpore 2 38 Tel_Aviv 2 39 Tokyo 2 40 Toronto 2 41 Zurich 2 42 Bogota 3 43 Hong_Kon 3 44 Kuala_Lu 3 45 Manila 3 46 Taipei 3 Obs city CLUSTER

75 Multidimensional Scaling Up to this point, we have been working with methods for taking distances and then clustering objects But does using distances remind you of anything? Have you ever been in a really long car ride, and had nothing better to look at than an atlas? Did you ever notice that the back of the atlas provides a nice distance matrix to look at? Those distances come from the proximity of cities on the map

76 Multidimensional Scaling Multidimensional scaling is the process by which a map is drawn from a set of distances Before we get into the specifics of the procedure, let s begin with an example Let s take a state (that is a favorite of mine) that you may or may not be very familiar with Nevada

77 Multidimensional Scaling You may be familiar with several cities in NV Las Vegas Reno Carson City But have you heard of Lovelock? Fernley? Winnemucca?

78 Mapping Nevada From the State of Nevada website, I was able to download a mileage chart for the distance between cities in the Great State of Nevada Using this chart and MDS, we will now draw a map of the Silver State with the cities in their proper locations

79 Nevada from MDS vs. from a Map

80 How MDS Works Multidimensional scaling attempts to recreate the distances in the observed distance matrix by a set of coordinates in a smaller number of dimensions The algorithm works by selecting a set of values for distances, then computing an index of how bad the distances fit, called Stress: Stress( q) i k d i k ( q) ˆ ( q) ik dik d ( q) ik 2 2 1/ 2

81 How MDS Works (Continued) Based on an optimization procedure, new values for the distances are selected, and the new value for Stress is computed The algorithm exits when values of Stress do not change by large amounts Note that this procedure is also subject to local optima multiple runs of the procedure may be necessary

82 Types of MDS Although this is a gross simplification of the methods used for MDS, there are generally two techniques used to scale distances: Metric MDS uses distances (magnitudes) to obtain solution Nonmetric MDS uses ordinal (rank) information of object s distances to obtain solution

83 MDS of Paired Comparison Data In graduate school, I once went through a phase where I was into NASCAR A friend of mine and I would always argue about whether or not NASCAR was a sport We attempted to settle the argument the old-fashioned way, by applying advanced quantitative methods to data sets

84 MDS of Sports Perceptions So, I set out to understand a bit more about how auto racing in general is perceived relative to other sports I created a questionnaire asking for people to judge the similarity of two sports at a time, across 21 different sports I had eight of my closest friends complete the survey (which was quite lengthy because of all of the pairs of sports needing to be compared)

85 MDS in SAS To do MDS in SAS, we will use proc mds The online user guide for proc mds can be found at: _toc.htm This is also SAS Example #5 in the example code file online

86 MDS of Sports Perceptions proc mds data=sports level=ordinal coef=diagonal dimension=2 pfinal out=out outres=res ; run; title1 'Plot of configuration'; %plotit(data=out(where=(_type_='config')), datatype=mds, labelvar=_name_, vtoh=1.75); %plotit(data=out(where=(_type_='config')), datatype=mds, plotvars=dim1 dim2, labelvar=_name_, vtoh=1.75); run;

87 Plot of configuration 00:39 Sunday, November 28, Multidimensional Scaling: Data=WORK.SPORTS.DATA Shape=TRIANGLE Condition=MATRIX Level=ORDINAL Coef=DIAGONAL Dimension=2 Formula=1 Fit=1 Mconverge=0.01 Gconverge=0.01 Maxiter=100 Over=2 Ridge= Badness- Convergence Measures of-fit Change in ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Iteration Type Criterion Criterion Monotone Gradient ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 0 Initial Monotone Lev-Mar Monotone Lev-Mar Monotone Lev-Mar Monotone Lev-Mar Monotone Lev-Mar Monotone Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar Lev-Mar E Multidimensional Scaling: Data=WORK.SPORTS.DATA Shape=TRIANGLE Condition=MATRIX Level=ORDINAL Coef=DIAGONAL Dimension=2 Formula=1 Fit=1 Badness- Convergence Measures of-fit Change in ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Iteration Type Criterion Criterion Monotone Gradient ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 43 Lev-Mar Lev-Mar Lev-Mar Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Lev-Mar E Convergence criteria are satisfied.

88 Multidimensional Scaling: Data=WORK.SPORTS.DATA Shape=TRIANGLE Condition=MATRIX Level=ORDINAL Coef=DIAGONAL Dimension=2 Formula=1 Fit=1 Configuration Dim1 Dim2 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Auto_Racing Baseball Basketball Biking Bowling Boxing Card_Games Chess Football Golf Hockey Martial_Arts Pool Skiing Soccer Swimming Tennis Track Volleyball Weightlifting Dimension Coefficients _MATRIX_ 1 2 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Multidimensional Scaling: Data=WORK.SPORTS.DATA Shape=TRIANGLE Condition=MATRIX Level=ORDINAL Coef=DIAGONAL Dimension=2 Formula=1 Fit=1 Number of Badness-of- Uncorrected Nonmissing Fit Distance Distance _MATRIX_ Data Weight Criterion Correlation Correlation ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

89 MDS Plot

90 More MDS As alluded to earlier, there are multiple methods for performing MDS There are ways of examining MDS for each individual (Individual Differences Scaling INDSCAL) There are ways of doing MDS on categorical data (Correspondence Analysis)

91 Wrapping Up Today we covered methods for clustering and displaying distances Again, we only scratched the surface of the methods presented today Tomorrow we will talk about more methods for clustering: mixture models Mixture models are model-based clustering methods distributional assumptions are needed

Tree Models of Similarity and Association. Clustering and Classification Lecture 5

Tree Models of Similarity and Association Clustering and Lecture 5 Today s Class Tree models. Hierarchical clustering methods. Fun with ultrametrics. 2 Preliminaries Today s lecture is based on the monograph