Using Cross-Validation
main topic
Cross-validation calculates the predictive ability of potential models
to help you determine the appropriate number of components to retain in
your model. Cross-validation is recommended if you do not know the optimal
number of components. When the data contain multiple response variables,
Minitab validates the components for all responses simultaneously. For
more information, see [18].
Listed below are the methods for cross-validation:
· Leave-one-out:
Calculates potential models leaving out one observation at a time. For
large data sets, this method can be time-consuming, because it recalculates
the models as many times as there are observations.
· Leave-group-out
of size: Calculates the models leaving multiple observations out at a
time, reducing the number of times it has to recalculate a model. This
method is most appropriate when you have a large data set.
· Leave-out-as-specified-in-column:
Calculates the models, simultaneously leaving out the observations that
have matching numbers in the group identifier column, which you create
in the worksheet. This method allows you to specify which observations
are omitted together. For example, if the group identifier column includes
numbers 1, 2, and 3, all observations with 1 are omitted together and
the model is recalculated. Next, all observations with 2 are omitted and
the model is recalculated, and so on. In this case, the model is recalculated
a total of 3 times. The group identifier column must be the same length
as your response and predictor columns and cannot contain missing values.