In order to regulate catches of salmon stocks, it is desirable to identify fish as being of Alaskan or Canadian origin. Fifty fish from each place of origin were caught and growth ring diameters of scales were measured for the time when they lived in freshwater and for the subsequent time when they lived in saltwater. The goal is to be able to identify newly-caught fish as being from Alaskan or Canadian stocks. The example and data are from [6], page 519-520.
1 Open the worksheet EXH_MVAR.MTW.
2 Choose Stat > Multivariate > Discriminant Analysis.
3 In Groups, enter SalmonOrigin.
4 In Predictors, enter Freshwater Marine. Click OK.
Session window output
Discriminant Analysis: SalmonOrigin versus Freshwater, Marine
Linear Method for Response: SalmonOrigin
Predictors: Freshwater, Marine
Group Alaska Canada Count 50 50
Summary of classification
True Group Put into Group Alaska Canada Alaska 44 1 Canada 6 49 Total N 50 50 N correct 44 49 Proportion 0.880 0.980
N = 100 N Correct = 93 Proportion Correct = 0.930
Squared Distance Between Groups
Alaska Canada Alaska 0.00000 8.29187 Canada 8.29187 0.00000
Linear Discriminant Function for Groups
Alaska Canada Constant -100.68 -95.14 Freshwater 0.37 0.50 Marine 0.38 0.33
Summary of Misclassified Observations
Squared Observation True Group Pred Group Group Distance Probability 1** Alaska Canada Alaska 3.544 0.428 Canada 2.960 0.572 2** Alaska Canada Alaska 8.1131 0.019 Canada 0.2729 0.981 12** Alaska Canada Alaska 4.7470 0.118 Canada 0.7270 0.882 13** Alaska Canada Alaska 4.7470 0.118 Canada 0.7270 0.882 30** Alaska Canada Alaska 3.230 0.289 Canada 1.429 0.711 32** Alaska Canada Alaska 2.271 0.464 Canada 1.985 0.536 71** Canada Alaska Alaska 2.045 0.948 Canada 7.849 0.052 |
As shown in the Summary of Classification table, the discriminant analysis correctly identified 93 of 100 fish, though the probability of correctly classifying an Alaskan fish was lower (44/50 or 88%) than was the probability of correctly classifying a Canadian fish (49/50 or 98%). To identify newly-caught fish, you could compute the linear discriminant functions associated with Alaskan and Canadian fish and identify the new fish as being of a particular origin depending upon which discriminant function value is higher. You can either do this by using Calc > Calculator using stored or output values, or performing discriminant analysis again and predicting group membership for new observations.
The Summary of Misclassified Observations table shows the squared distances from each misclassified point to group centroids and the posterior probabilities. The squared distance value is that value from observation to the group centroid, or mean vector. The probability value is the posterior probability. Observations are assigned to the group with the highest posterior probability.