Example of Ordinal Logistic Regression
main topic
     interpreting results     session command     see also 

Suppose you are a field biologist and you believe that adult population of salamanders in the Northeast has gotten smaller over the past few years. You would like to determine whether any association exists between the length of time a hatched salamander survives and level of water toxicity, as well as whether there is a regional effect. Survival time is coded as 1 if < 10 days, 2 = 10 to 30 days, and 3 = 31 to 60 days.

1    Open the worksheet EXH_REGR.MTW.

2    Choose Stat > Regression > Ordinal Logistic Regression.

3    In Response, enter Survival. In Model, enter Region ToxicLevel. In Factors (optional), enter Region.

4    Click Results. Choose In addition, list of factor level values, and tests for terms with more than 1 degree of freedom. Click OK in each dialog box.

Session window output

Ordinal Logistic Regression: Survival versus Region, ToxicLevel

 

 

Link Function: Logit

 

 

Response Information

 

Variable  Value  Count

Survival  1         15

          2         46

          3         12

          Total     73

 

 

Factor Information

 

Factor  Levels  Values

Region       2  1, 2

 

 

Logistic Regression Table

 

                                                Odds     95% CI

Predictor       Coef    SE Coef      Z      P  Ratio  Lower  Upper

Const(1)    -7.04343    1.68017  -4.19  0.000

Const(2)    -3.52273    1.47108  -2.39  0.017

Region

 2          0.201456   0.496153   0.41  0.685   1.22   0.46   3.23

ToxicLevel  0.121289  0.0340510   3.56  0.000   1.13   1.06   1.21

 

 

Log-Likelihood = -59.290

Test that all slopes are zero: G = 14.713, DF = 2, P-Value = 0.001

 

 

Goodness-of-Fit Tests

 

Method    Chi-Square   DF      P

Pearson      122.799  122  0.463

Deviance     100.898  122  0.918

 

 

Measures of Association:

(Between the Response Variable and Predicted Probabilities)

 

Pairs       Number  Percent  Summary Measures

Concordant    1126     79.2  Somers’ D              0.59

Discordant     288     20.3  Goodman-Kruskal Gamma  0.59

Ties             8      0.6  Kendall’s Tau-a        0.32

Total         1422    100.0

Interpreting the results

The Session window contains the following five parts:

Response Information displays the number of observations that fall into each of the response categories, and the number of missing observations. The ordered response values, from lowest to highest, are shown. Here, we use the default coding scheme which orders the values from lowest to highest: 1 is < 10 days, 2 = 10 to 30 days, and 3 = 31 to 60 days (see Reference event for the response variable on page).

Factor Information displays all the factors in the model, the number of levels for each factor, and the factor level values. The factor level that has been designated as the reference level is first entry under Values, region 1 (see Reference event for the response variable on page).

Logistic Regression Table shows the estimated coefficients, standard error of the coefficients, z-values, and p-values. When you use the logit link function, you see the calculated odds ratio, and a 95% confidence interval for the odds ratio.

·    The values labeled Const(1) and Const(2) are estimated intercepts for the logits of the cumulative probabilities of survival for <10 days, and for 10-30 days, respectively. Because the cumulative probability for the last response value is 1, there is not need to estimate an intercept for 31-60 days.

·    The coefficient of 0.2015 for Region is the estimated change in the logit of the cumulative survival time probability when the region is 2 compared to region being 1, with the covariate ToxicLevel held constant. Because the p-value for estimated coefficient is 0.685, there is insufficient evidence to conclude that region has an effect upon survival time.

·    There is one estimated coefficient for each covariate, which gives parallel lines for the factor levels. Here, the estimated coefficient for the single covariate, ToxicLevel, is 0.121, with a p-value of < 0.0005. The p-value indicates that for most a-levels, there is sufficient evidence to conclude that the toxic level affects survival. The positive coefficient, and an odds ratio that is greater than one indicates that higher toxic levels tend to be associated with lower values of survival. Specifically, a one-unit increase in water toxicity results in a 13% increase in the odds that a salamander lives less than or equal to 10 days versus greater than 30 days and that the salamander lives less than or equal to 30 days versus greater than 30 days.

·    Next displayed is the last Log-Likelihood from the maximum likelihood iterations along with the statistic G. This statistic tests the null hypothesis that all the coefficients associated with predictors equal zero versus at least one coefficient is not zero. In this example, G = 14.713 with a p-value of 0.001, indicating that there is sufficient evidence to conclude that at least one of the estimated coefficients is different from zero.

Goodness-of-Fit Tests displays both Pearson and deviance goodness-of-fit tests. In our example, the p-value for the Pearson test is 0.463, and the p-value for the deviance test is 0.918, indicating that there is insufficient evidence to claim that the model does not fit the data adequately. If the p-value is less than your selected a-level, the test rejects the null hypothesis that the model fits the data adequately.

Measures of Association display a table of the number and percentage of concordant, discordant and tied pairs, and common rank correlation statistics. These values measure the association between the observed responses and the predicted probabilities.

·    The table of concordant and discordant pairs and tied pairs is calculated by pairing the observations with different response values. Here, we have 15 1's, 46 2's, and 12 3's, resulting in 15 x 46 + 15 x 12 + 46 x 12 = 1422 pairs of different response values. For pairs involving the lowest coded response value (the 1-2 and 1-3 value pairs in the example), a pair is concordant if the cumulative probability up to the lowest response value (here 1) is greater for the observation with the lowest value. This works similarly for other value pairs. For pairs involving responses coded as 2 and 3 in our example, a pair is concordant if the cumulative probability up to 2 is greater for the observation coded as 2. The pair is discordant if the opposite is true. The pair is tied if the cumulative probabilities are equal. In our example, 79.2% of pairs are concordant, 20.3% are discordant, and 0.6% are ties. You can use these values as a comparative measure of prediction. For example, you can use them in evaluating predictors and different link functions.

·    Somers' D, Goodman-Kruskal Gamma, and Kendall's Tau-a are summaries of the table of concordant and discordant pairs. The numbers have the same numerator: the number of concordant pairs minus the number of discordant pairs. The denominators are the total number of pairs with Somers' D, the total number of pairs excepting ties with Goodman-Kruskal Gamma, and the number of all possible observation pairs for Kendall's Tau-a. These measures most likely lie between 0 and 1 where larger values indicate a better predictive ability of the model.