Ill-conditioned data relates to problems in the predictor variables, which can cause both statistical and computational difficulties. There are two types of problems: multicollinearity and a small coefficient of variation. The checks for ill-conditioned data in Minitab have been heavily influenced by Velleman et al. [39], [40].
Multicollinearity means that some predictors are correlated with other predictors. If this correlation is high, Minitab displays a warning message and continues computation. The predicted values and residuals still are computed with high statistical and numerical accuracy, but the standard errors of the coefficients will be large and their numerical accuracy may be affected. If the correlation of a predictor with other predictors is very high, Minitab eliminates the predictor from the model, and displays a message.
To identify predictors that are highly collinear, you can examine the correlation structure of the predictor variables and regress each suspicious predictor on the other predictors. You can also review the variance inflation factors (VIF), which measure how much the variance of an estimated regression coefficient increases if your predictors are correlated. If the VIF < 1, there is no multicollinearity but if the VIF is > 1, predictors may be correlated. Montgomery and Peck suggest that if the VIF is 5 - 10, the regression coefficients are poorly estimated.
Some possible solutions to the problem of multicollinearity are:
Predictors with small coefficients of variation that are nearly constant can cause numerical problems. For example, the variable YEAR with values from 1970 to 1975 has a small coefficient of variation and numerical differences among the variables are contained in the fourth digit. The problem is compounded if YEAR is squared. You could subtract a constant from the data, replacing YEAR with YEARS SINCE 1970, which has values 0 to 5.
If the coefficient of variation is moderately small, some loss of statistical accuracy will occur. In this case, Minitab tells you that the predictor is nearly constant. If the coefficient of variation is very small, Minitab eliminates the predictor from the model, and displays a message.
More |
If your data are extremely ill-conditioned, Minitab removes one of the problematic columns from the model. You can use the TOLERANCE subcommand with REGRESS to force Minitab to keep that column in the model. Lowering the tolerance can be dangerous, possibly producing numerically inaccurate results. See Session command help for more information. |