Example of multiple regression
main topic
     interpreting results     session command    
see also 

As part of a test of solar energy, you measure the total heat flux from homes. You wish to examine whether total heat flux (HeatFlux) can be predicted by the position of the focal points in the east, south, and north directions. Data are from [33]. You found, using best subsets regression, that the best two-predictor model included the variables North and South and the best three-predictor added the variable East. You evaluate the three-predictor model using multiple regression.

1    Open the worksheet EXH_REGR.MTW.

2    Choose Stat > Regression > Regression > Fit Regression Model.

3    In Responses, enter HeatFlux.

4    In Continuous predictors, enter East South North.

5    Click Graphs.

6    Under Residuals for Plots, choose Standardized.

7    Under Residual Plots, choose Individual Plots. Check Histogram of residuals, Normal probability plot of the residuals, and Residuals versus fits.

8    Click OK in each dialog box.

Session window output

Regression Analysis: HeatFlux versus East, South, North

 

 

Analysis of Variance

 

Source      DF   Adj SS   Adj MS  F-Value  P-Value

Regression   3  12833.9   4278.0    57.87    0.000

  East       1    226.3    226.3     3.06    0.092

  South      1   2255.1   2255.1    30.51    0.000

  North      1  12330.6  12330.6   166.80    0.000

Error       25   1848.1     73.9

Total       28  14681.9

 

 

Model Summary

 

      S    R-sq  R-sq(adj)  R-sq(pred)

8.59782  87.41%     85.90%      78.96%

 

 

Coefficients

 

Term        Coef  SE Coef  T-Value  P-Value   VIF

Constant   389.2     66.1     5.89    0.000

East        2.12     1.21     1.75    0.092  1.12

South      5.318    0.963     5.52    0.000  1.21

North     -24.13     1.87   -12.92    0.000  1.09

 

 

Regression Equation

 

HeatFlux = 389.2 + 2.12 East + 5.318 South - 24.13 North

 

 

Fits and Diagnostics for Unusual Observations

 

                                Std

Obs  HeatFlux     Fit  Resid  Resid

  4    230.70  210.20  20.50   2.94  R

 22    254.50  237.16  17.34   2.32  R

 

R  Large residual

Interpreting the results

Session window output

·    The p-value for the regression model in the Analysis of Variance table (0.000) shows that the model estimated by the regression procedure is significant at an a-level of 0.05. This indicates that at least one coefficient is different from zero.

·    The p-values for the estimated coefficients of North and South are both 0.000, indicating that they are significantly related to HeatFlux. The p-value for East is 0.092, indicating that it is not related to HeatFlux at an a-level of 0.05. Additionally, the sequential sum of squares indicates that the predictor East doesn't explain a substantial amount of unique variance. This suggests that a model with only North and South may be more appropriate.  

·    The VIFs are all close to 1, which indicates that the predictors are not correlated. VIF values greater than 5-10 suggest that the regression coefficients are poorly estimated due to severe multicollinearity.

·    The R2 value indicates that the predictors explain 87.41% of the variance in HeatFlux. The adjusted R2  is 85.90%, which accounts for the number of predictors in the model. Both values indicate that the model fits the data well.

·    The predicted R2 value is 78.96%. Because the predicted R2 value is close to the R2  and adjusted R2 values, the model does not appear to be overfit and has adequate predictive ability.

·    Observations 4 and 22 are identified as unusual because the absolute value of the standardized residuals are greater than 2. This may indicate they are outliers. See Checking your model,  Identifying outliers, and Choosing a residual type.

Graph window output

·    The histogram indicates that outliers may exist in the data, shown by the two bars on the far right side of the plot.

·    The normal probability plot shows an approximately linear pattern consistent with a normal distribution. The two points in the upper-right corner of the plot may be outliers. Brushing the graph identifies these points as 4 and 22, the same points that are labeled unusual observations in the output. See Checking your model and Identifying outliers.

·    The plot of residuals versus the fitted values shows that the residuals get smaller (closer to the reference line) as the fitted values increase, which may indicate the residuals have non-constant variance. See [9] for information on non-constant variance.