Individual Distribution Identification

Graphs - Distribution ID Plots: Normal, Box-Cox Transformation, Lognormal, and 3-Parameter Lognormal

  

Use the probability plots to compare the fit of all distributions so you can choose the best-fitting distribution.

The probability plots include:

·    Points, which are the estimated percentiles for corresponding probabilities of an ordered data set.

·    Middle lines, which are the expected percentile from the distribution based on maximum likelihood parameter estimates. If the distribution is a good fit for the data, the points form a straight line.

·    Left lines, which are formed by connecting the lower bounds on confidence intervals for each percentile. Similarly, the right line is formed by connecting the upper bounds on confidence intervals for each percentile. If a distribution is a good fit, the points fall within these bounds.

·    Anderson-Darling test statistics with corresponding p-values to assess if your data follows a distribution.

For several distributions, Minitab offers the standard version as well as a version with an extra parameter. In these cases, use the LRT P to determine whether adding the extra parameter significantly improves the fit over the distribution without the extra parameter. A LRT P value less than 0.05 suggests that the improvement is significant.

The LRT P value is also useful for 3-parameter distributions for which there is no established method for calculating the p-value. In these cases, it is advisable to first examine the p-value for the corresponding two-parameter distribution. Then look at the LRT P for the 3-parameter distribution to determine whither the three-parameter distribution is significantly better than the two-parameter distribution. However, it may be advisable to choose a distribution which has a calculated p-value and a similar AD value.

Example Output

image\idid_1n.gif

Interpretation

For the calcium data, the probability plots for the lognormal distribution and Box-Cox transformation show that data points fall close to the middle lines and within the confidence intervals. Also, the Anderson-Darling (AD) statistics (lognormal: 0.650, Box-Cox: 0.398) and p-values (lognormal: 0.085, Box-Cox: 0.353) suggest that they fit the data well.

For the 3-parameter lognormal distribution, there is no established method for calculating the p-value. In these cases, it is advisable to first examine the p-value for the corresponding two-parameter distribution (0.085) which indicates a good fit. Then look at the LRT P for the 3-parameter lognormal distribution (from the goodness-of-fit tests, 0.017) which indicates that the three-parameter distribution is significantly better than the two-parameter distribution. Additionally, a visual inspection of the probability plot combined with the AD value (0.341) suggests that this distribution is a good fit. However, it may be advisable to choose a different distribution that has a calculated p-value and a similar AD value.

The probability plots, Anderson-Darling statistics, and p-values for the normal and exponential distributions suggest that these distributions do not fit the calcium data well.