B.
Hypothesis testing
1.
Basics
Define and distinguish between statistical and practical significance and apply
tests for significance level, power, type I and type II errors. Determine
appropriate sample size for various test. (Apply).
2.
Tests for means, variances, and proportions
Define, compare, and contrast statistical and practical significance. (Apply)
3.
Paired-comparison tests
Define and describe paired-comparison parametric hypothesis tests. (Understand)
4.
Single-factor analysis of variance (ANOVA)
Define terms related to one-way ANOVAs and interpret their results and data
plots. (Apply)
5.
Chi square
Define and interpret chi square and use it to determine statistical
significance. (Analyze)
Hypothesis testing - refers to the process of using
statistical analysis to determine if the observed differences between two or
more samples are due to random chance (as stated in the null hypothesis) or to
true differences in the samples (as stated in the alternate hypothesis). A null
hypothesis (Ho) is a stated assumption that there is no difference in
parameters (mean, variance, DPMO) for two or more populations. The alternate
hypothesis (Ha) is a statement that the observed difference or relationship
between two populations is real and not the result of chance or an error in
sampling. Hypothesis testing is the process of using a variety of statistical
tools to analyze data and, ultimately, to fail to reject or reject the null
hypothesis. From a practical point of view, finding statistical evidence that
the null hypothesis is false allows you to reject the null hypothesis and
accept the alternate hypothesis.
·
Null hypothesis – can only be rejected,
or fail to be rejected. It cannot be accepted because of a lack of evidence to
reject it.
o
Type I error – this error occurs when the
null hypothesis is rejected when it is in fact true. The probability of making
a type I error is called alpha and is commonly referred to as the producer’s
risk. Example: incoming products are good but called bad; a process change is
thought to be different when, in fact, there is no difference.
·
Level of significance (alpha
risk) is the probability of rejecting a null hypothesis when it is true.
o
Type II error – this error occurs when
the null hypothesis is not rejected when it should be rejected. The probability
of making a type II error is called beta and is commonly referred to as the
consumer’s risk. Example: incoming products are bad, but called good; an
adverse process change has occurred but is thought to be no different.
o
The assumption is that a small value for alpha is
desirable, but, for a fixed sample size, alpha and beta are inversely related.
Therefore, the smaller the alpha risk, the larger the beta risk. As alpha
increases, beta decreases.
o
A type I (alpha) risk is the risk of rejecting a
true hypothesis. When moving from an alpha of .01 to .05 (99% assurance to 95%
assurance), one is more willing to accept a type I error.
·
One-tail test – if a null hypothesis is
established to test whether a sample value is smaller or larger than a population
value, then the entire alpha risk is placed on one end of the distribution
curve.
·
Two-tail test – if a null hypothesis is
established to test whether a population shift has occurred in either direction
and the allowable alpha error is generally divided into two equal parts.
·
Significance level –
significance level and confidence level are complementary, i.e., they both sum
to 100%.
·
Critical probability value – alpha;
hypothesis testing compares a test statistic to a predetermined alpha.
·
p-value – a small p-value is an
indication that the null hypothesis is false.
·
Z-value – a two-tailed test requires
that the alpha risk be divided in two, half on each side of the distribution.
Given alpha = 10% (level of confidence = 90%), alpha/2 = 5% and the Z table
value is 1.645
Statistical vs. practical significance – some
hypothesis is found to be statistically significant but may not be worth the
effort or expense to implement.
·
Power of test
·
Required sample size – for
statistical inference, it is best to determine the alpha and beta errors
desired and then calculate the sample size necessary to obtain the desired
decision confidence. If sample size (n) is a fraction, round up to next
integer. The sample size needed for hypothesis testing depends on:
o
The desired type 1 and type II risk.
o
The minimum value to be detected between the population
means.
o
The variation in the characteristic being measured.
·
Point estimation
·
Interval estimation
(confidence interval)
o
Confidence intervals for the mean – a 90%
confidence interval means that given the sample data, there is a 90%
probability that the population mean is contained in the interval.
·
In order to calculate, must know the sample mean, standard
deviation, sample size, and…
·
Large sample size – use the normal distribution (Z value)
·
Small sample size (<30) – the t distribution must be
used.
o
Confidence intervals for variation –
interval is non-symmetrical; requires the use of two different chi-square
values
o
Confidence intervals for proportion
·
Hypothesis tests for means
o
Z test – single sample mean; standard
deviation of population is known.
o
Student’s t test – single sample mean.
Standard deviation of population is unknown and sample size is small.
Paired-comparison tests
·
Paired t test - The two-sample t-test is used to determine
if two population means are equal. The data may either be paired or not paired.
For paired t test, the data is dependent, i.e. there is a one-to-one
correspondence between the values in the two samples.
For example, same subject measured before & after a process change, or same
subject measured at different times. For unpaired t test, the sample sizes for
the two samples may or may not be equal. DF=n-1
·
2 Mean Equal
Variance t test – 2 sample means;
variances are unknown but considered equal.
·
2 Mean Unequal
Variance t test – 2 sample means;
variances are unknown but considered unequal. Has a very large, complex DF
calculation.
·
F test - test of whether two samples are
drawn from different populations have the same standard deviation, with
specified confidence level. Samples may be of different sizes. The F statistic
is the ratio of two sample variances (2 chi-square distributions) and the F
distribution resembles the chi-square distribution. DF use (n-1) for both n1 and n2
Analysis of variance (ANOVA) - Analysis of variance is a statistical
technique for analyzing data that test for a difference between two or more
means by comparing the variances within groups and variances between groups.
·
The test used for
testing significance is the F test In an ANOVA table, the between group
variation is compared to the within group variation. ANOVA is a test for
equality of means, and the appropriate test is the F test.
·
Total SS=total sum
of squares; SST=sum of squares among treatments; SSE=sum of squares within
treatments.
·
One-way
ANOVA – one factor is
being tested at more than one level. One way means one factor. The total
variation in the data has two parts: the variation among treatment means and
the variation within treatments. Total DF (Total Degrees of Freedom)= N-1; TDF
(Treatment DF)= t–1 (t=number of treatments).
o
To test the null
hypothesis – F=MST/MSE
·
MST – Mean
Variation between Treatments
·
MSE – Mean
Variation Within Treatments; MSE means the mean square of the error (or
residual)
Chi-Square - The Chi Square Test is a statistical
test that consists of three different types of analysis 1) Goodness of fit, 2)
Test for Homogeneity, 3) Test of Independence. The Test for Goodness of fit
determines if the sample under analysis was drawn from a population that follows
some specified distribution. The Test for Homogeneity answers the proposition
that several populations are homogeneous with respect to some characteristic.
The Test for independence (one of the most frequent uses of Chi Square) is for
testing the null hypothesis that two criteria of classification, when applied
to a population of subjects are independent. If they are not independent then
there is an association between them.
·
Comparing variances when the variances of the
population is known
·
Comparing expected and observed frequencies of test
outcomes
·
Degrees of Freedom (DF) = (rows –1)(columns –1)