Partial Least Squares

Summary

  

Partial least squares (PLS) is a biased regression procedure that relates a set of predictors to multiple response variables. PLS was developed to use with ill-conditioned data (predictors are highly correlated or outnumber observations).

PLS reduces the predictors to a set of uncorrelated components based on the covariance between X and Y, then performs least squares regression on these components. Two important features of PLS include cross-validation and prediction:

·    Use cross-validation to select the number of components that produces the most accurate predictive model.

·    Use prediction to evaluate the model's predictive ability or calculate responses for the new data.

Data Description

Scientists at a food chemistry laboratory analyzed 60 soybean flour samples. For each sample, they determined the moisture and fat content, and recorded near-infrared (NIR) spectral data at 88 wavelengths. Using 54 of the 60 samples, the scientists estimated the relationship between the responses (moisture and fat) and the predictors (the 88 NIR wavelengths) using partial least squares (PLS). They used the remaining six samples as a test set to evaluate the model's predictive ability.

Data: Soybean.MTW (available in the Sample Data folder).