Using Automatic Variable Selection Procedures
main topic
Variable selection procedures can be a valuable tool in data analysis,
particularly in the early stages of building a model. At the same time,
these procedures present certain dangers. Here are some considerations:
· Because
the procedures automatically "snoop" through many models, the
model selected may fit the data "too well." That is, the procedure
can look at many variables and select ones which, by pure chance, happen
to fit well.
· The
three automatic procedures are heuristic algorithms, which often work
very well but which may not select the model with the highest R2
value (for a given number of predictors).
· Automatic
procedures cannot take into account special knowledge the analyst may
have about the data. Therefore, the model selected may not be the best
from a practical point of view.