Example of partial least squares regression
main topic interpreting results session command see also

You are a wine producer who wants to know how the chemical composition of your wine relates to sensory evaluations. You have 37 Pinot Noir wine samples, each described by 17 elemental concentrations (Cd, Mo, Mn, Ni, Cu, Al, Ba, Cr, Sr, Pb, B, Mg, Si, Na, Ca, P, K) and a score on the wine's aroma from a panel of judges. You want to predict the aroma score from the 17 elements and determine that PLS is an appropriate technique because the ratio of samples to predictors is low. Data are from [12]. You want to include all elements (Cd-K) and all two-way interactions that include Cd in the model.

1 Open the worksheet WINEAROMA.MTW.

2 Choose Stat > Regression > Partial Least Squares.

3 In Responses, enter Aroma.

4 In Model, enter Cd-K Cd*Mo Cd*Mn Cd*Ni Cd*Cu Cd*Al Cd*Ba Cd*Cr Cd*Sr Cd*Pb Cd*B Cd*Mg Cd*Si Cd*Na Cd*Ca Cd*P Cd*K.

5 Click Options.

6 Under Cross-Validation, choose Leave-one-out. Click OK.

7 Click Graphs, then check Model selection plot, Response plot, Std Coefficient plot, Distance plot, Residual versus leverage, and Loading plot. Uncheck Coefficient plot.

8 Click OK in each dialog box.

Session window output

PLS Regression: Aroma versus Cd, Mo, Mn, Ni, Cu, Al, Ba, Cr, Sr, Pb, B, Mg, Si, Na, Ca, P, K

Method

Cross-validation Leave-one-out

Components to evaluate Set

Number of components evaluated 10

Number of components selected 4

Analysis of Variance for Aroma

Source DF SS MS F P

Regression 4 34.5514 8.63784 41.55 0.000

Residual Error 32 6.6519 0.20787

Total 36 41.2032

Model Selection and Validation for Aroma

Components X Variance Error R-Sq PRESS R-Sq (pred)

1 0.158849 14.9389 0.637435 23.3439 0.433444

2 0.442267 12.2966 0.701564 21.0936 0.488060

3 0.522977 7.9761 0.806420 19.6136 0.523978

4 0.594546 6.6519 0.838559 18.1683 0.559056

5 5.8530 0.857948 19.2675 0.532379

6 5.0123 0.878352 22.3739 0.456988

7 4.3109 0.895374 24.0041 0.417421

8 4.0866 0.900818 24.7736 0.398747

9 3.5886 0.912904 24.9090 0.395460

10 3.2750 0.920516 24.8293 0.397395

Graph window output

Interpreting the results

Session window output

· The Method table indicates the number of components Minitab evaluated and the number of components selected as the optimal model. The optimal model is defined as the model with the highest predicted R2. Minitab selected the four-component model as the optimal model, with a predicted R2 of 0.56.

· Minitab displays one Analysis of Variance table per response based on the optimal model. The p-value for aroma is 0.000, which is less than an alpha of 0.05, providing sufficient evidence that the four-component model is significant.

· Use the Model Selection and Validation table to select the optimal number of components for your model. Depending on your data or field of study, you may determine that a model other than the one selected by cross-validation is more appropriate. The model with four components, which was selected by cross-validation, has an R2 of 83.8% and a predicted R2 of 55.9%.

· The X-variance indicates the amount of variance in the predictors that is explained by the model. In this example, the four-component model explains 59.4% of the variance in the predictors.

Graph window output

· The model selection plot is a graphical display of the Model Selection and Validation table. The vertical line indicates that the optimal model has four components. You can see that the predictive ability of all models with more than four components decreases significantly.

· The response plot indicates that the model fits the data adequately because the points are in a linear pattern, from the bottom left-hand corner to the top right-hand corner. Although there are differences between the fitted and cross-validated fitted responses, none are severe enough to indicate an extreme leverage point.

· The coefficient plot displays the standardized coefficients for the predictors. You can use this plot to interpret the magnitude and sign of the coefficients. The elements Mo, Cu, Sr, Pb, B, Ca, Cd*Sr, Cd*B have the largest standardized coefficients and the biggest impact on aroma. The elements Mo, Pb, B, and Cd*B are positively related to aroma, while Cu, Sr, Ca, and Cd*Sr are negatively related.

· The loading plot compares the relative influence of the predictors on the response. In this example, Cu and Ni have very short lines, indicating that they have low x-loadings and are not related to aroma. The elements Sr, Mg, and Ba have long lines, indicating that they have higher loadings and are more related to aroma.

· The distance plot and the residual versus leverage plot display outliers and leverages. By brushing the distance plot, you can see that compared to the rest of the data:

- observations 14 and 32 have a greater distance value on the y-axis

- observations in rows 1 and 37 have greater distance value on the x-axis

The residual versus leverage plot shows that:

- observation 3 is an outlier because it is outside the horizontal reference lines

- observations 5, 12, 14, 23, and 37 have extreme leverage values because they are to the right of the vertical reference line

Example of partial least squares regression main topic interpreting results session command see also

Interpreting the results

Session window output

Graph window output

Example of partial least squares regression
main topic interpreting results session command see also