Example of Nominal Logistic Regression
main topic
     interpreting results     session command     see also 

Suppose you are a grade school curriculum director interested in what children identify as their favorite subject and how this is associated with their age or the teaching method employed. Thirty children, 10 to 13 years old, had classroom instruction in science, math, and language arts that employed either lecture or discussion techniques. At the end of the school year, they were asked to identify their favorite subject. We use nominal logistic regression because the response is categorical and possesses no implicit categorical ordering.

1    Open the worksheet EXH_REGR.MTW.

2    Choose Stat > Regression > Nominal Logistic Regression.

3    In Response, enter Subject. In Model, enter TeachingMethod Age. In Factors (optional), enter TeachingMethod.

4    Click Results. Choose In addition, list of factor level values, and tests for terms with more than 1 degree of freedom. Click OK in each dialog box.

Session window output

Nominal Logistic Regression: Subject versus TeachingMethod, Age

 

 

Response Information

 

Variable  Value    Count

Subject   science     10  (Reference Event)

          math        11

          arts         9

          Total       30

 

 

Factor Information

 

Factor          Levels  Values

TeachingMethod       2  discuss, lecture

 

 

Logistic Regression Table

 

                                                             Odds      95% CI

Predictor                     Coef   SE Coef      Z      P  Ratio  Lower   Upper

Logit 1: (math/science)

Constant                  -1.12266   4.56425  -0.25  0.806

TeachingMethod

 lecture                 -0.563115  0.937591  -0.60  0.548   0.57   0.09    3.58

Age                       0.124674  0.401079   0.31  0.756   1.13   0.52    2.49

Logit 2: (arts/science)

Constant                  -13.8485   7.24256  -1.91  0.056

TeachingMethod

 lecture                   2.76992   1.37209   2.02  0.044  15.96   1.08  234.90

Age                        1.01354  0.584494   1.73  0.083   2.76   0.88    8.66

 

 

Log-Likelihood = -26.446

Test that all slopes are zero: G = 12.825, DF = 4, P-Value = 0.012

 

 

Goodness-of-Fit Tests

 

Method    Chi-Square  DF      P

Pearson      6.95295  10  0.730

Deviance     7.88622  10  0.640

Interpreting the results

The Session window output contains the following five parts:

Response Information displays the number of observations that fall into each of the response categories (science, math, and language arts), and the number of missing observations. The response value that has been designated as the reference event is the first entry under Value. Here, the default coding scheme defines the reference event as science using reverse alphabetical order.

Factor Information displays all the factors in the model, the number of levels for each factor, and the factor level values. The factor level that has been designated as the reference level is the first entry under Values. Here, the default coding scheme defines the reference level as discussion using alphabetical order.

Logistic Regression Table shows the estimated coefficients (parameter estimates), standard error of the coefficients, z-values, and p-values. You also see the odds ratio and a 95% confidence interval for the odds ratio. The coefficient associated with a predictor is the estimated change in the logit with a one unit change in the predictor, assuming that all other factors and covariates are the same.

·    If there are k response distinct values, Minitab estimates k-1 sets of parameter estimates, here labeled as Logit(1) and Logit(2). These are the estimated differences in log odds or logits of math and language arts, respectively, compared to science as the reference event. Each set contains a constant and coefficients for the factor(s), here teaching method, and the covariate(s), here age. The TeachingMethod coefficient is the estimated change in the logit when TeachingMethod is lecture compared to the teaching method being discussion, with Age held constant. The Age coefficient is the estimated change in the logit with a one year increase in age with teaching method held constant. These sets of parameter estimates gives nonparallel lines for the response values.

·    The first set of estimated logits, labeled Logit(1), are the parameter estimates of the change in logits of math relative to the reference event, science. The p-values of 0.548 and 0.756 for TeachingMethod and Age, respectively, indicate that there is insufficient evidence to conclude that a change in teaching method from discussion to lecture or in age affected the choice of math as favorite subject as compared to science.

·    The second set of estimated logits, labeled Logit(2), are the parameter estimates of the change in logits of language arts relative to the reference event, science. The p-values of 0.044 and 0.083 for TeachingMethod and Age, respectively, indicate that there is sufficient evidence, if the p-values are less than your acceptable a-level, to conclude that a change in teaching method from discussion to lecture or in age affected the choice of language arts as favorite subject compared to science. The positive coefficient for teaching method indicates students given a lecture style of teaching tend to prefer language arts over science compared to students given a discussion style of teaching. The estimated odds ratio of 15.96 implies that the odds of choosing language arts over science is about 16 times higher for these students when the teaching method changes from discussion to lecture. The positive coefficient associated with age indicates that students tend to like language arts over science as they become older.

Next displayed is the last Log-Likelihood from the maximum likelihood iterations along with the statistic G. G is the difference in -2 log-likelihood for a model which only has the constant terms and the fitted model shown in the Logistic Regression Table. G is the test statistic for testing the null hypothesis that all the coefficients associated with predictors equal 0 versus them not all being zero. G = 12.825 with a p-value of 0.012, indicating that at a = 0.05, there is sufficient evidence for at least one coefficient being different from 0.

Goodness-of-Fit Tests displays Pearson and deviance goodness-of-fit tests. In our example, the p-value for the Pearson test is 0.730 and the p-value for the deviance test is 0.640, indicating that there is evidence to suggest the model fits the data. If the p-value is less than your selected a-level, the test would indicate that the model does not fit the data.