You live-trap, anesthetize, and measure one hundred forty-three black bears. The measurements are total length and head length (Length, Head.L), total weight and head weight (Weight, Head.W), and neck girth and chest girth (Neck.G, Chest.G). You wish to classify these 143 bears as small, medium-sized, or large bears. You know that the second, seventy-eighth, and fifteenth bears in the sample are typical of the three respective categories. First, you create an initial partition column with the three seed bears designated as 1 = small, 2 = medium-sized, 3 = large, and with the remaining bears as 0 (unknown) to indicate initial cluster membership. Then you perform K-means clustering and store the cluster membership in a column named BearSize.
1 Open the worksheet BEARS.MTW.
2 To create the initial partition column, choose Calc > Make Patterned Data > Simple Set of Numbers.
3 In Store patterned data in, enter Initial for the storage column name.
4 In both From first value and From last value, enter 0.
5 In List each value, enter 143. Click OK.
6 Go to the Data window and enter 1, 2, and 3 in the second, seventy-eighth, and fifteenth rows, respectively, of the column named Initial.
7 Choose Stat > Multivariate > Cluster K-Means.
8 In Variables, enter 'Head.L'-Weight.
9 Under Specify Partition by, choose Initial partition column and enter Initial.
10 Check Standardize variables.
11 Click Storage. In Cluster membership column, enter BearSize.
12 Click OK in each dialog box.
Session window output
K-means Cluster Analysis: Head.L, Head.W, Neck.G, Length, Chest.G, Weight
Standardized Variables
Final Partition
Number of clusters: 3
Within Average Maximum cluster distance distance Number of sum of from from observations squares centroid centroid Cluster1 41 63.075 1.125 2.488 Cluster2 67 78.947 0.997 2.048 Cluster3 35 65.149 1.311 2.449
Cluster Centroids
Grand Variable Cluster1 Cluster2 Cluster3 centroid Head.L -1.0673 0.0126 1.2261 -0.0000 Head.W -0.9943 -0.0155 1.1943 0.0000 Neck.G -1.0244 -0.1293 1.4476 -0.0000 Length -1.1399 0.0614 1.2177 0.0000 Chest.G -1.0570 -0.0810 1.3932 -0.0000 Weight -0.9460 -0.2033 1.4974 -0.0000
Distances Between Cluster Centroids
Cluster1 Cluster2 Cluster3 Cluster1 0.0000 2.4233 5.8045 Cluster2 2.4233 0.0000 3.4388 Cluster3 5.8045 3.4388 0.0000 |
K-means clustering classified the 143 bears as 41 small bears, 67 medium-size bears, and 35 large bears. Minitab displays, in the first table, the number of observations in each cluster, the within cluster sum of squares, the average distance from observation to the cluster centroid, and the maximum distance of observation to the cluster centroid. In general, a cluster with a small sum of squares is more compact than one with a large sum of squares. The centroid is the vector of variable means for the observations in that cluster and is used as a cluster midpoint.
The centroids for the individual clusters are displayed in the second table while the third table gives distances between cluster centroids.
The column BearSize contains the cluster designations.