Coding categorical predictors in partial least squares
main topic
   
 

To include categorical predictors in your partial least squares model, Minitab codes the categories so they can be included in the model. Partial Least Squares does this automatically. You have two coding options: 1, 0 coding or - 1, 0,  1 coding. Regardless of the coding method you choose, the test of the overall effect of the categorical variable remains the same.

When you have categorical predictors, the regression coefficients are interpreted relative to a reference level. See Setting reference levels in Partial Least Squares for more information.

1, 0 coding

1, 0 coding (also known as binary or dummy coding) is commonly used in regression analyses.

For example, you want to include the categorical predictor Location in your model. Location has three levels: Hong Kong, London, and New York. If you choose 1, 0 coding, Minitab codes the three levels of the predictor as shown below. In 1, 0 coding, the reference level is the level that is first in alphabetical order for text categorical predictors. Therefore, Hong Kong is the reference level.

If location is...

London is coded as...

New York is coded as...

Hong Kong

0

0

London

1

0

New York

0

1

-1, 0, 1 coding

You can also choose to code categorical predictors with a - 1, 0, 1 scheme (also known as effect or treatment coding). - 1, 0, 1 coding is used in General Linear Models and Design of Experiements (DOE).

In - 1, 0, 1 coding, the reference level is the level that is last in alphabetical order in - 1, 0, 1 coding. Therefore, New York is the reference level. In the example below, if the row of any column corresponds to New York, it is assigned a - 1.

If location is...

Hong Kong is coded as...

London is coded as...

Hong Kong

1

0

London

0

1

New York

- 1

- 1