Using Automatic Variable Selection Procedures
main topic
 

Variable selection procedures can be a valuable tool in data analysis, particularly in the early stages of building a model. At the same time, these procedures present certain dangers. Here are some considerations:

·    Because the procedures automatically "snoop" through many models, the model selected may fit the data "too well." That is, the procedure can look at many variables and select ones which, by pure chance, happen to fit well.

·    The three automatic procedures are heuristic algorithms, which often work very well but which may not select the model with the highest R2 value (for a given number of predictors).

·    Automatic procedures cannot take into account special knowledge the analyst may have about the data. Therefore, the model selected may not be the best from a practical point of view.