Sometimes items or subjects from different groups are encountered according to different probabilities. If you know or can estimate these probabilities a priori, discriminant analysis can use these so-called prior probabilities in calculating the posterior probabilities, or probabilities of assigning observations to groups given the data. With the assumption that the data have a normal distribution, the linear discriminant function is increased by ln(pi), where pi is the prior probability of group i. Because observations are assigned to groups according to the smallest generalized distance, or equivalently the largest linear discriminant function, the effect is to increase the posterior probabilities for a group with a high prior probability.
Now suppose we have priors and suppose fi(x) is the joint density for the data in group i (with the population parameters replaced by the sample estimates).
The posterior probability is the probability of group i given the data and is calculated by
The largest posterior probability is equivalent to the largest value of .
If fi(x) is the normal distribution, then
- (a constant)
The term in square brackets is called the generalized squared distance of x to group i and is denoted by . Notice,
The term in square brackets is the linear discriminant function. The only difference from the non-prior case is a change in the constant term. Notice, the largest posterior is equivalent to the smallest generalized distance, which is equivalent to the largest linear discriminant function.