Example of Cluster Observations
main topic
     interpreting results     session command     see also 

You make measurements on five nutritional characteristics (protein, carbohydrate, and fat content, calories, and percent of the daily allowance of Vitamin A) of 12 breakfast cereal brands. The example and data are from p. 623 of [6]. The goal is to group cereal brands with similar characteristics. You use clustering of observations with the complete linkage method, squared Euclidean distance, and you choose standardization because the variables have different units. You also request a dendrogram and assign different line types and colors to each cluster.

1    Open the worksheet CEREAL.MTW.

2    Choose Stat > Multivariate > Cluster Observations.

3    In Variables or distance matrix, enter Protein-VitaminA.

4    From Linkage Method, choose Complete and from Distance Measure choose Squared Euclidean.

5    Check Standardize variables.

6    Under Specify Final Partition by, choose Number of clusters and enter 4.

7    Check Show dendrogram.

8    Click Customize. In Title, enter Dendrogram for Cereal Data.

9    Click OK in each dialog box.

Session window output

Cluster Analysis of Observations: Protein, Carbo, Fat, Calories, VitaminA

 

 

Standardized Variables, Squared Euclidean Distance, Complete Linkage

Amalgamation Steps

 

 

                                                           Number

                                                          of obs.

      Number of  Similarity  Distance  Clusters      New   in new

Step   clusters       level     level   joined   cluster  cluster

   1         11     100.000    0.0000  5     12        5        2

   2         10      99.822    0.0640  3      5        3        3

   3          9      98.792    0.4347  3     11        3        4

   4          8      94.684    1.9131  6      8        6        2

   5          7      93.406    2.3730  2      3        2        5

   6          6      87.329    4.5597  7      9        7        2

   7          5      86.189    4.9701  1      4        1        2

   8          4      80.601    6.9810  2      6        2        7

   9          3      68.079   11.4873  2      7        2        9

  10          2      41.409   21.0850  1      2        1       11

  11          1       0.000   35.9870  1     10        1       12

 

 

Final Partition

Number of clusters: 4

 

 

                                      Average   Maximum

                             Within  distance  distance

             Number of  cluster sum      from      from

          observations   of squares  centroid  centroid

Cluster1             2      2.48505   1.11469   1.11469

Cluster2             7      8.99868   1.04259   1.76922

Cluster3             2      2.27987   1.06768   1.06768

Cluster4             1      0.00000   0.00000   0.00000

 

 

Cluster Centroids

 

Variable  Cluster1   Cluster2  Cluster3  Cluster4  Grand centroid

Protein    1.92825  -0.333458  -0.20297  -1.11636       0.0000000

Carbo     -0.75867   0.541908   0.12645  -2.52890       0.0000000

Fat        0.33850  -0.096715   0.33850  -0.67700       0.0000000

Calories   0.28031   0.280306   0.28031  -3.08337      -0.0000000

VitaminA  -0.63971  -0.255883   2.04707  -1.02353      -0.0000000

 

 

Distances Between Cluster Centroids

 

          Cluster1  Cluster2  Cluster3  Cluster4

Cluster1   0.00000   2.67275   3.54180   4.98961

Cluster2   2.67275   0.00000   2.38382   4.72050

Cluster3   3.54180   2.38382   0.00000   5.44603

Cluster4   4.98961   4.72050   5.44603   0.00000

Graph window output

Interpreting the results

Minitab displays the amalgamation steps in the Session window. At each step, two clusters are joined. The table shows which clusters were joined, the distance between them, the corresponding similarity level, the identification number of the new cluster (this number is always the smaller of the two numbers of the clusters joined), the number of observations in the new cluster, and the number of clusters. Amalgamation continues until there is just one cluster.

The amalgamation steps show that the similarity level decreases by increments of about 6 or less until it decreases by about 13 at the step from four clusters to three. This indicates that four clusters are reasonably sufficient for the final partition. If this grouping makes intuitive sense for the data, then it is probably a good choice.

When you specify the final partition, Minitab displays three additional tables. The first table summarizes each cluster by the number of observations, the within cluster sum of squares, the average distance from observation to the cluster centroid, and the maximum distance of observation to the cluster centroid. In general, a cluster with a small sum of squares is more compact than one with a large sum of squares. The centroid is the vector of variable means for the observations in that cluster and is used as a cluster midpoint. The second table displays the centroids for the individual clusters while the third table gives distances between cluster centroids.

The dendrogram displays the information in the amalgamation table in the form of a tree diagram. In our example, cereals 1 and 4 make up the first cluster; cereals 2, 3, 5, 12, 11, 6, and 8 make up the second; cereals 7 and 9 make up the third; cereal 10 makes up the fourth.