Example of Cluster K-Means
main topic
     interpreting results     session command     see also 

You live-trap, anesthetize, and measure one hundred forty-three black bears. The measurements are total length and head length (Length, Head.L), total weight and head weight (Weight, Head.W), and neck girth and chest girth (Neck.G, Chest.G). You wish to classify these 143 bears as small, medium-sized, or large bears. You know that the second, seventy-eighth, and fifteenth bears in the sample are typical of the three respective categories. First, you create an initial partition column with the three seed bears designated as 1 = small, 2 = medium-sized, 3 = large, and with the remaining bears as 0 (unknown) to indicate initial cluster membership. Then you perform K-means clustering and store the cluster membership in a column named BearSize.

1    Open the worksheet BEARS.MTW.

2    To create the initial partition column, choose Calc > Make Patterned Data > Simple Set of Numbers.

3    In Store patterned data in, enter Initial for the storage column name.

4    In both From first value and From last value, enter 0.

5    In List each value, enter 143. Click OK.

6    Go to the Data window and enter 1, 2, and 3 in the second, seventy-eighth, and fifteenth rows, respectively, of the column named Initial.

7    Choose Stat > Multivariate > Cluster K-Means.

8    In Variables, enter 'Head.L'-Weight.

9    Under Specify Partition by, choose Initial partition column and enter Initial.

10  Check Standardize variables.

11  Click Storage. In Cluster membership column, enter BearSize.

12  Click OK in each dialog box.

Session window output

K-means Cluster Analysis: Head.L, Head.W, Neck.G, Length, Chest.G, Weight

 

 

Standardized Variables

 

 

Final Partition

 

 

Number of clusters: 3

 

 

                         Within   Average   Maximum

                        cluster  distance  distance

             Number of   sum of      from      from

          observations  squares  centroid  centroid

Cluster1            41   63.075     1.125     2.488

Cluster2            67   78.947     0.997     2.048

Cluster3            35   65.149     1.311     2.449

 

 

Cluster Centroids

 

                                           Grand

Variable  Cluster1  Cluster2  Cluster3  centroid

Head.L     -1.0673    0.0126    1.2261   -0.0000

Head.W     -0.9943   -0.0155    1.1943    0.0000

Neck.G     -1.0244   -0.1293    1.4476   -0.0000

Length     -1.1399    0.0614    1.2177    0.0000

Chest.G    -1.0570   -0.0810    1.3932   -0.0000

Weight     -0.9460   -0.2033    1.4974   -0.0000

 

 

Distances Between Cluster Centroids

 

          Cluster1  Cluster2  Cluster3

Cluster1    0.0000    2.4233    5.8045

Cluster2    2.4233    0.0000    3.4388

Cluster3    5.8045    3.4388    0.0000

Interpreting the results

K-means clustering classified the 143 bears as 41 small bears, 67 medium-size bears, and 35 large bears. Minitab displays, in the first table, the number of observations in each cluster, the within cluster sum of squares, the average distance from observation to the cluster centroid, and the maximum distance of observation to the cluster centroid. In general, a cluster with a small sum of squares is more compact than one with a large sum of squares. The centroid is the vector of variable means for the observations in that cluster and is used as a cluster midpoint.

The centroids for the individual clusters are displayed in the second table while the third table gives distances between cluster centroids.

The column BearSize contains the cluster designations.