Basic Statistics Overview

Use Minitab's basic statistics capabilities for calculating basic statistics and for simple estimation and hypothesis testing with one or two samples. The basic statistics capabilities include procedures for:

· Calculating or storing descriptive statistics

· Hypothesis tests and confidence intervals of the mean or difference in means

· Hypothesis tests and confidence intervals for a proportion or the difference in proportions

· Hypothesis tests and confidence intervals of the occurrence rate, mean number of occurrences, and the differences between them for Poisson processes.

· Hypothesis tests and confidence intervals for one variance, and for the difference between two variances

· Measuring association

· Testing for normality of a distribution

· Testing whether data follow a Poisson distribution

Calculating and storing descriptive statistics

· Display Descriptive Statistics produces descriptive statistics for each column or subset within a column. You can display the statistics in the Session window and/or display them in a graph.

· Store Descriptive Statistics stores descriptive statistics for each column or subset within a column.

· Graphical Summary produces four graphs and an output table in one graph window.

For a list of descriptive statistics available for display or storage, see Descriptive Statistics Available for Display or Storage. To calculate descriptive statistics individually and store them as constants, see Column Statistics.

Confidence intervals and hypothesis tests of means

The four procedures for hypothesis tests and confidence intervals for population means or the difference between means are based upon the distribution of the sample mean following a normal distribution. According to the Central Limit Theorem, the normal distribution becomes an increasingly better approximation for the distribution of the sample mean drawn from any distribution as the sample size increases.

· 1-Sample Z computes a confidence interval or performs a hypothesis test of the mean when the population standard deviation, s, is known. This procedure is based upon the normal distribution, so for small samples, this procedure works best if your data were drawn from a normal distribution or one that is close to normal. From the Central Limit Theorem, you may use this procedure if you have a large sample, substituting the sample standard deviation for s. A common rule of thumb is to consider samples of size 30 or higher to be large samples. Many analysts choose the t-procedure over the Z-procedure whenever s is unknown.

· 1-Sample t computes a confidence interval or performs a hypothesis test of the mean when s is unknown. This procedure is based upon the t-distribution, which is derived from a normal distribution with unknown s. For small samples, this procedure works best if your data were drawn from a distribution that is normal or close to normal. This procedure is more conservative than the Z-procedure and should always be chosen over the Z-procedure with small sample sizes and an unknown s. Many analysts choose the t-procedure over the Z-procedure anytime s is unknown. According to the Central Limit Theorem, you can have increasing confidence in the results of this procedure as sample size increases, because the distribution of the sample mean becomes more like a normal distribution.

· 2-Sample t computes a confidence interval and performs a hypothesis test of the difference between two population means when s's are unknown and samples are drawn independently from each other. This procedure is based upon the t-distribution, and for small samples it works best if data were drawn from distributions that are normal or close to normal. You can have increasing confidence in the results as the sample sizes increase.

· Paired t computes a confidence interval and performs a hypothesis test of the difference between two population means when observations are paired (matched). When data are paired, as with before-and-after measurements, the paired t-procedure results in a smaller variance and greater power of detecting differences than would the above 2-sample t-procedure, which assumes that the samples were independently drawn.

Confidence intervals and hypothesis tests of proportions

· 1 Proportion computes a confidence interval and performs a hypothesis test of a population proportion.

· 2 Proportions computes a confidence interval and performs a hypothesis test of the difference between two population proportions.

Confidence intervals and hypothesis tests of Poisson rates

· 1-Sample Poisson Rate computes a confidence interval and performs a hypothesis test on the occurrence rate and mean number of occurrences in a Poisson process.

· 2-Sample Poisson Rate computes a confidence interval and performs a hypothesis test on the difference in occurrence rates and the difference in the mean number of occurrences of two Poisson processes.

Confidence intervals and hypothesis tests of variance

· 1 Variance computes a confidence interval and performs a hypothesis test on the variance of one sample.

· 2 Variances computes a confidence interval and performs a hypothesis test for the equality, or homogeneity, of variance of two samples.

Measures of association

· Correlation calculates the Pearson product moment correlation coefficient or the Spearman rank-order correlation coefficient for pairs of variables. The correlation coefficient measures the degree of linear or monotonic relationship between two variables. You can obtain a p-value to test if there is sufficient evidence that the correlation coefficient is not zero.

By using a combination of Minitab commands, you can also compute a partial correlation coefficient. A partial correlation coefficient is the correlation coefficient between two variables while adjusting for the effects of other variables.

· Covariance calculates the covariance for pairs of variables. The covariance is a measure of the relationship between two variables but it has not been standardized, as is done with the correlation coefficient, by dividing by the standard deviation of both variables.

Tests for normality and outliers

· Normality Test generates a normal probability plot and performs a hypothesis test to examine whether or not the observations follow a normal distribution. Some statistical procedures, such as a Z- or t-test, assume that the samples were drawn from a normal distribution. Use this procedure to test the normality assumption.

· Outlier Test identifies a single outlier in a sample.

Goodness-of-fit test

Goodness-of-Fit Test for Poisson evaluates whether your data follow a Poisson distribution. Some statistical procedures, such as a U chart, assume that the data follow a Poisson distribution. Use this procedure to test this assumption.