IV.            Six Sigma – Analyze (15 Questions)

                              A.            Exploratory data analysis

                                                       1.            Multi-vari studies
Create and interpret multi-vari studies to interpret the difference between positional, cyclical, and temporal variation; apply sampling plans to investigate the largest sources of variation. (Create)

                                                       2.            Simple linear correlation and regression
Interpret the correlation coefficient and determine its statistical significance (p-value); recognize the difference between correlation and causation. Interpret the linear regression equation and determine its statistical significance (p-value). Use regression models for estimation and prediction. (Evaluate)

Exploratory data analysis

Multi-vari analysis – a chart used to analyze variation, and also used to investigate the stability or consistency of a process. It identifies where and where not to investigate, and the principle advantage is that it breaks down variation into components so that improvements can be made. It normally contains all (or most) of the readings taken.

·          Positional – variation within a piece

·          Cyclical – variation from piece to piece

·          Temporal – variation caused by time related changes

Simple linear correlation and regression - a method that enables you to determine the relationship between a continuous process output (Y) and one factor (X). The relationship is typically expressed in terms of a mathematical equation such as (Y = Bo + B1X). Bo=Y intercept when X=0; B1=slope of line.

·          Best fit line – plot the points on a graph and place a line through the majority of points or “best fit.”

·          Least squares – choose as the “best fit” line the line that minimizes the sum of the squares of the deviations of observed values.

·          Multiple linear regression – although not included in the BOK, it is an extension of simple linear regression to more than one independent variable.

Correlation coefficient – quantifies the degree of linear association between two variables. It is typically denoted by r and will have a value ranging between negative 1 and positive 1. A positive value implies that the line slopes upward to the right and a negative value indicates that it slopes downward to the right. When r=0, all points are scattered and give no evidence of a linear correlation; when r=1 or r=-1, all points fall on a straight line; and any other value suggests the degree of linear relation.

·          Coefficient of determination – the square of the linear correlation coefficient. Determines the amount of variability explained by the regression model.

p-Value - The probability value (p-value) of a statistical hypothesis test is the probability of getting a value of the test statistic as extreme as or more extreme than that observed by chance alone, if the null hypothesis Ho, is true. It is the probability of wrongly rejecting the null hypothesis if it is in fact true. It is equal to the significance level of the test for which we would only just reject the null hypothesis. The p-value is compared with the desired significance level of our test and, if it is smaller, the result is significant. That is, if the null hypothesis were to be rejected at the 5% significance level, this would be reported as "p < 0.05". Small p-values suggest that the null hypothesis is unlikely to be true. The smaller it is, the more convincing the evidence is that null hypothesis is false. It indicates the strength of evidence for say, rejecting the null hypothesis Ho, rather than simply concluding "Reject Ho" or "Do not reject Ho".