Example of Binary Logistic Regression
main topic
     interpreting results     session command     see also 

You are a researcher who is interested in understanding the effect of smoking and weight upon resting pulse rate. Because you have categorized the response-pulse rate-into low and high, a binary logistic regression analysis is appropriate to investigate the effects of smoking and weight upon pulse rate.

1    Open the worksheet EXH_REGR.MTW.

2    Choose Stat > Regression > Binary Logistic Regression > Fit Binary Logistic Model.

3    For the type of data, choose Response in binary response/frequency format.

4    In Response, enter RestingPulse.

5    In Response event, choose Low.

6    In Continuous predictors, enter Weight.

7    In Categorical predictors, enter Smokes.

8    Click Coding.

9    In Increment, enter 10.

10  Click OK.

11  Click Graphs.

12  Choose Three in one.

13  Click OK in each dialog box.

Session window output

 

Binary Logistic Regression: RestingPulse versus Weight, Smokes

 

 

Method

 

Link function                 Logit

Categorical predictor coding  (1, 0)

Rows used                     92

 

 

Response Information

 

Variable      Value  Count

RestingPulse  Low       70  (Event)

              High      22

              Total     92

 

 

Deviance Table

 

Source      DF  Adj Dev  Adj Mean  Chi-Square  P-Value

Regression   2    7.574     3.787        7.57    0.023

  Weight     1    4.629     4.629        4.63    0.031

  Smokes     1    4.737     4.737        4.74    0.030

Error       89   93.640     1.052

Total       91  101.214

 

 

Model Summary

 

Deviance   Deviance

    R-Sq  R-Sq(adj)    AIC

   7.48%      5.51%  99.64

 

 

Coefficients

 

Term        Coef  SE Coef   VIF

Constant   -1.99     1.68

Weight    0.0250   0.0123  1.12

Smokes

  Yes     -1.193    0.553  1.12

 

 

Odds Ratios for Continuous Predictors

 

        Unit of

         Change  Odds Ratio       95% CI

Weight       10      1.2843  (1.0101, 1.6330)

 

 

Odds Ratios for Categorical Predictors

 

Level A  Level B  Odds Ratio       95% CI

Smokes

  Yes    No           0.3033  (0.1026, 0.8966)

 

Odds ratio for level A relative to level B

 

 

Regression Equation

 

P(Low)  =  exp(Y')/(1 + exp(Y'))

 

 

Smokes

No      Y' = -1.987 + 0.02502 Weight

 

Yes     Y' = -3.180 + 0.02502 Weight

 

 

Goodness-of-Fit Tests

 

Test             DF  Chi-Square  P-Value

Deviance         89       93.64    0.348

Pearson          89       88.63    0.491

Hosmer-Lemeshow   8        4.75    0.784

 

 

Fits and Diagnostics for Unusual Observations

 

        Observed

Obs  Probability     Fit    Resid  Std Resid

 56       0.0000  0.8689  -2.0159      -2.04  R

 86       1.0000  0.3828   1.3858       1.46     X

 

R  Large residual

X  Unusual X

Graph window output

 

Interpreting the results

The Session window output contains ten parts. The graph window contains three plots in one graph.

Method: Displays the link function and other information about the analysis. This model uses the logit link function.

Response Information: Displays the number of missing observations and the number of observations that fall into each of the two response categories. The response value that has been designated as the reference event is the first entry under Value and is labeled as the event. In this case, the reference event is low pulse rate.

Deviance Table: Displays the likelihood ratio test p-values for the coefficients. In the output, you can see that the estimated coefficients for both Smokes (p = 0.030) and Weight (p = 0.031) have p-values that are less than 0.05. These results indicate that there is sufficient evidence that the coefficients are not zero using an a-level of 0.05. The p-value for the overall regression tests the null hypothesis that all the coefficients for predictors are equal to zero. The alternative hypothesis is that at least one of the coefficients for a predictor is not equal to zero. In this example, the p-value is 0.023. This p-value indicates that there is sufficient evidence that at least one of the coefficients is different from zero, given that your accepted a-level is greater than 0.023.

Model Summary: Displays the statistics you can use to compare how well different models fit the data. Higher values of deviance R2 and adjusted deviance R2 indicate a better fit. Smaller values of Akaike Information Criterion (AIC) indicate a better fit. The current model has a deviance R2 value of 7.48%, an adjusted R2 value of 5.51%, and an AIC of 99.64. Another model might have better fit statistics.

Coefficients: Shows the estimated coefficients, standard error of the coefficients, and variance inflation factors (VIF).  When you use the logit link function, you also see the odds ratio and a 95% confidence interval for the odds ratio.

The estimated coefficient of -1.193 for Smokes represents the change in the log of P(low pulse)/P(high pulse). The interpretation of the coefficient is for when the subject smokes compared to when he/she does not smoke. The coefficient assumes that the covariate Weight is constant. The estimated coefficient for Weight is 0.0250. The coefficient represents the change in the log of P(low pulse)/P(high pulse) with a 1 unit (1 pound) increase in Weight, with the factor Smokes held constant.

Odds Ratios for Continuous Predictors: Although there is evidence that the estimated coefficient for Weight is not zero, the estimated coefficient is very close to zero  (0.0250). This odds ratio indicates that a 1 pound increase in weight minimally affects a person's resting pulse rate. A more meaningful difference would be found if you compared subjects with a larger weight difference with the odds ratio. For example, if the weight unit is 10 pounds, the odds ratio becomes 1.2843. The larger odds ratio indicates that the odds of a subject having a low pulse increases by 1.2843 times with each 10 pound increase in weight.

Odds Ratios for Categorical Predictors: For Smokes, the negative coefficient of -1.193 and the odds ratio of 0.3033 indicate that subjects who smoke tend to have a higher resting pulse rate than subjects who do not smoke. Given that subjects have the same weight, the odds ratio can be interpreted as the odds of smokers in the sample having a low pulse being 30% of the odds of non-smokers having a low pulse.

Regression Equation: Displays the transformation that changes the linear equation into a predicted probability and a linear equation for each combination of categorical predictors. In this case, there are two equations, one for each level of the Smokes variable. The constant term is more negative for the people who smoke, so these people have a lower probability of having a lower resting pulse. Because there is no interaction with weight in the model, the coefficient is the same in both equations.

Goodness-of-Fit Tests: Displays Pearson, deviance, and Hosmer-Lemeshow goodness-of-fit tests. The goodness-of-fit tests, with p-values ranging from 0.348 to 0.724, indicate that there is insufficient evidence to claim that the model does not fit the data adequately. If the p-value is less than your accepted a-level, the test would reject the null hypothesis of an adequate fit.

Fits and Diagnostics for Unusual Observations: Displays observations that have large standardized residuals or large leverage values. In this case, observation 56 is not fit well by the model. You might further investigate this case to see why the model did not fit it well. Observation 86 can have large influence on the model. You might fit the model without this case to see how much influence the observation has on the results.

Plots: In the example, you chose three residual plots. The normal probability plot of the residuals is not a straight line, and the histogram of the residuals is bi-modal. You should be cautious about how you interpret output that relies on normal theory, like confidence intervals for the predicted probabilities.