|
Partial Least SquaresModel Selection and Validation Tables - R-Sq (Pred) |
The predicted R-squared value tells you how well each calculated model predicts the response and is only calculated when you use cross-validation. Minitab selects the PLS model with the highest predicted R.
Examine the R and predicted R values to determine if the model selected by cross-validation is most appropriate. In some cases, you may decide to use a different model than the one selected by cross-validation. Consider an example where adding two components to the model Minitab selects significantly increases R and only slightly decreases the predicted R. Because the predicted R only decreased slightly, the model is not overfit and you may decide it better suits your data.
Example Output |
Model Selection and Validation for Moisture
Components X Variance Error R-Sq PRESS R-Sq (pred) 1 0.984976 96.9288 0.806643 103.549 0.793436 2 0.996400 88.9900 0.822479 105.650 0.789245 3 0.997757 71.9304 0.856510 91.172 0.818127 4 0.999427 58.3174 0.883666 75.778 0.848836 5 0.999722 58.1261 0.884048 78.385 0.843634 6 0.999853 48.5236 0.903203 69.024 0.862308 7 0.999963 45.9824 0.908272 71.146 0.858076 8 0.999976 33.1545 0.933862 51.386 0.897493 9 0.999982 32.8074 0.934554 51.055 0.898154 10 0.999986 32.7773 0.934615 53.299 0.893677
Model Selection and Validation for Fat
Components X Variance Error R-Sq PRESS R-Sq (pred) 1 0.984976 282.519 0.050127 308.628 0.000000 2 0.996400 229.964 0.226824 267.199 0.101637 3 0.997757 115.951 0.610155 143.986 0.515895 4 0.999427 98.285 0.669550 127.389 0.571698 5 0.999722 57.994 0.805015 76.435 0.743012 6 0.999853 53.097 0.821480 72.109 0.757560 7 0.999963 52.010 0.825133 72.412 0.756540 8 0.999976 48.842 0.835784 76.432 0.743024 9 0.999982 34.344 0.884529 67.884 0.771764 10 0.999986 31.050 0.895604 65.116 0.781068 |
Interpretation |
In this example, cross-validation selected 10 components for the PLS model because it produced the highest average predicted R. The predicted R for moisture is 89.4%; for fat, 78.1%. The scientists determine that the 10-component model is the best model for their data.