Total heat flux is measured as part of a solar thermal energy test. You wish to see how total heat flux is predicted by other variables: insolation, the position of the focal points in the east, south, and north directions, and the time of day. Data are from Montgomery and Peck [31], page 486.
1 Open the worksheet EXH_REGR.MTW.
2 Choose Stat > Regression > Regression > Best Subsets.
3 In Response, enter Heatflux.
4 In Free Predictors, enter Insolation-Time. Click OK.
Session window output
Best Subsets Regression: HeatFlux versus Insolation, East, ...
Response is HeatFlux
I n s o l a S N t E o o T i a u r i R-Sq R-Sq Mallows o s t t m Vars R-Sq (adj) (pred) Cp S n t h h e 1 72.1 71.0 66.9 38.5 12.328 X 1 39.4 37.1 26.3 112.7 18.154 X 2 85.9 84.8 81.4 9.1 8.9321 X X 2 82.0 80.6 74.2 17.8 10.076 X X 3 87.4 85.9 79.0 7.6 8.5978 X X X 3 86.5 84.9 81.4 9.7 8.9110 X X X 4 89.1 87.3 80.6 5.8 8.1698 X X X X 4 88.0 86.0 79.3 8.2 8.5550 X X X X 5 89.9 87.7 78.8 6.0 8.0390 X X X X X |
Each line of the output represents a different model. Vars is the number of variables or predictors in the model. R and adjusted R are converted to percentages. Predictors that are present in the model are indicated by an X.
In this example, it isn't clear which model fits the data best. The model with all the variables has the highest adjusted R (87.7%), a low Mallows' Cp value (6.0), and the lowest S value (8.0390). The four-predictor model with all variables except Time has a lower Cp value (5.8), although S is slightly higher (8.16) and adjusted R is slightly lower (87.3%). The best three-predictor model includes North, South, and East, with a slightly higher Cp value (7.6) and a lower adjusted R(85.9%).
The best two predictor model includes North and South and is tied for having the highest predicted R-squared (81.4%). This fact suggests that the models that include additional predictors may be overfitting the data. Overfit models appear to explain the relationship between the predictor and response variables for the data set used for model calculation but fail to provide valid predictions for new observations. If you are mainly interested in predictions for new observations, this two predictor model may be the best model and you will only need to measure data for two predictors. Further, the multiple regression example indicates that adding the variable East does not improve the fit of the model.
Before choosing a model, you should always check to see if the models violate any regression assumptions using residual plots and other diagnostic tests. See Checking your model.