B.            Hypothesis testing

                                                       1.            Basics
Define and distinguish between statistical and practical significance and apply tests for significance level, power, type I and type II errors. Determine appropriate sample size for various test. (Apply).

                                                       2.            Tests for means, variances, and proportions
Define, compare, and contrast statistical and practical significance. (Apply)

                                                       3.            Paired-comparison tests
Define and describe paired-comparison parametric hypothesis tests. (Understand)

                                                       4.            Single-factor analysis of variance (ANOVA)
Define terms related to one-way ANOVAs and interpret their results and data plots. (Apply)

                                                       5.            Chi square
Define and interpret chi square and use it to determine statistical significance. (Analyze)

Hypothesis testing - refers to the process of using statistical analysis to determine if the observed differences between two or more samples are due to random chance (as stated in the null hypothesis) or to true differences in the samples (as stated in the alternate hypothesis). A null hypothesis (Ho) is a stated assumption that there is no difference in parameters (mean, variance, DPMO) for two or more populations. The alternate hypothesis (Ha) is a statement that the observed difference or relationship between two populations is real and not the result of chance or an error in sampling. Hypothesis testing is the process of using a variety of statistical tools to analyze data and, ultimately, to fail to reject or reject the null hypothesis. From a practical point of view, finding statistical evidence that the null hypothesis is false allows you to reject the null hypothesis and accept the alternate hypothesis.

·          Null hypothesis – can only be rejected, or fail to be rejected. It cannot be accepted because of a lack of evidence to reject it.

o         Type I error – this error occurs when the null hypothesis is rejected when it is in fact true. The probability of making a type I error is called alpha and is commonly referred to as the producer’s risk. Example: incoming products are good but called bad; a process change is thought to be different when, in fact, there is no difference.

·          Level of significance (alpha risk) is the probability of rejecting a null hypothesis when it is true.

o         Type II error – this error occurs when the null hypothesis is not rejected when it should be rejected. The probability of making a type II error is called beta and is commonly referred to as the consumer’s risk. Example: incoming products are bad, but called good; an adverse process change has occurred but is thought to be no different.

o         The assumption is that a small value for alpha is desirable, but, for a fixed sample size, alpha and beta are inversely related. Therefore, the smaller the alpha risk, the larger the beta risk. As alpha increases, beta decreases.

o         A type I (alpha) risk is the risk of rejecting a true hypothesis. When moving from an alpha of .01 to .05 (99% assurance to 95% assurance), one is more willing to accept a type I error.

·          One-tail test – if a null hypothesis is established to test whether a sample value is smaller or larger than a population value, then the entire alpha risk is placed on one end of the distribution curve.

·          Two-tail test – if a null hypothesis is established to test whether a population shift has occurred in either direction and the allowable alpha error is generally divided into two equal parts.

·          Significance level – significance level and confidence level are complementary, i.e., they both sum to 100%.

·          Critical probability value – alpha; hypothesis testing compares a test statistic to a predetermined alpha.

·          p-value – a small p-value is an indication that the null hypothesis is false.

·          Z-value – a two-tailed test requires that the alpha risk be divided in two, half on each side of the distribution. Given alpha = 10% (level of confidence = 90%), alpha/2 = 5% and the Z table value is 1.645

Statistical vs. practical significance – some hypothesis is found to be statistically significant but may not be worth the effort or expense to implement.

·          Power of test

·          Required sample size – for statistical inference, it is best to determine the alpha and beta errors desired and then calculate the sample size necessary to obtain the desired decision confidence. If sample size (n) is a fraction, round up to next integer. The sample size needed for hypothesis testing depends on:

o         The desired type 1 and type II risk.

o         The minimum value to be detected between the population means.

o         The variation in the characteristic being measured.

·          Point estimation

·          Interval estimation (confidence interval)

o         Confidence intervals for the mean – a 90% confidence interval means that given the sample data, there is a 90% probability that the population mean is contained in the interval.

·          In order to calculate, must know the sample mean, standard deviation, sample size, and…

·          Large sample size – use the normal distribution (Z value)

·          Small sample size (<30) – the t distribution must be used.

o         Confidence intervals for variation – interval is non-symmetrical; requires the use of two different chi-square values

o         Confidence intervals for proportion

·          Hypothesis tests for means

o         Z test – single sample mean; standard deviation of population is known.

o         Student’s t test – single sample mean. Standard deviation of population is unknown and sample size is small.

Paired-comparison tests

·          Paired t test - The two-sample t-test is used to determine if two population means are equal. The data may either be paired or not paired. For paired t test, the data is dependent, i.e. there is a one-to-one correspondence between the values in the two samples.
For example, same subject measured before & after a process change, or same subject measured at different times. For unpaired t test, the sample sizes for the two samples may or may not be equal. DF=n-1

·          2 Mean Equal Variance t test – 2 sample means; variances are unknown but considered equal.

·          2 Mean Unequal Variance t test – 2 sample means; variances are unknown but considered unequal. Has a very large, complex DF calculation.

·          F test - test of whether two samples are drawn from different populations have the same standard deviation, with specified confidence level. Samples may be of different sizes. The F statistic is the ratio of two sample variances (2 chi-square distributions) and the F distribution resembles the chi-square distribution. DF use (n-1) for both n1 and n2

Analysis of variance (ANOVA) - Analysis of variance is a statistical technique for analyzing data that test for a difference between two or more means by comparing the variances within groups and variances between groups.

·          The test used for testing significance is the F test In an ANOVA table, the between group variation is compared to the within group variation. ANOVA is a test for equality of means, and the appropriate test is the F test.

·          Total SS=total sum of squares; SST=sum of squares among treatments; SSE=sum of squares within treatments.

·          One-way ANOVA – one factor is being tested at more than one level. One way means one factor. The total variation in the data has two parts: the variation among treatment means and the variation within treatments. Total DF (Total Degrees of Freedom)= N-1; TDF (Treatment DF)= t–1 (t=number of treatments).

o         To test the null hypothesis – F=MST/MSE

·          MST – Mean Variation between Treatments

·          MSE – Mean Variation Within Treatments; MSE means the mean square of the error (or residual)

Chi-Square - The Chi Square Test is a statistical test that consists of three different types of analysis 1) Goodness of fit, 2) Test for Homogeneity, 3) Test of Independence. The Test for Goodness of fit determines if the sample under analysis was drawn from a population that follows some specified distribution. The Test for Homogeneity answers the proposition that several populations are homogeneous with respect to some characteristic. The Test for independence (one of the most frequent uses of Chi Square) is for testing the null hypothesis that two criteria of classification, when applied to a population of subjects are independent. If they are not independent then there is an association between them.

·          Comparing variances when the variances of the population is known

·          Comparing expected and observed frequencies of test outcomes

·          Degrees of Freedom (DF) = (rows –1)(columns –1)