Chapter 3 Experiments with a Single Factor: The Analysis
Download
Report
Transcript Chapter 3 Experiments with a Single Factor: The Analysis
Chapter 3 Experiments with a Single
Factor: The Analysis of Variance
1
3.1 An Example
• Chapter 2: A signal-factor experiment with two
levels of the factor
• Consider signal-factor experiments with a levels
of the factor, a 2
• Example:
– The tensile strength of a new synthetic fiber.
– The weight percent of cotton
– Five levels: 15%, 20%, 25%, 30%, 35%
– a = 5 and n = 5
2
• Does changing the
cotton weight percent
change the mean tensile
strength?
• Is there an optimum
level for cotton content?
3
3.2 The Analysis of Variance
• a levels (treatments) of a factor and n replicates
for each level.
• yij: the jth observation taken under factor level or
treatment i.
4
Models for the Data
• Means model:
i 1,2,...,a
yij i ij ,
j 1,2,...,n
– yij is the ijth observation,
– i is the mean of the ith factor level
– ij is a random error with mean zero
• Let i = + i , is the overall mean and i is
the ith treatment effect
• Effects model:
i 1,2,...,a
yij i ij ,
j 1,2,...,n
5
• Linear statistical model
• One-way or Signal-factor analysis of variance
model
• Completely randomized design: the experiments
are performed in random order so that the
environment in which the treatment are applied is
as uniform as possible.
• For hypothesis testing, the model errors are
assumed to be normally and independently
distributed random variables with mean zero and
variance, 2, i.e. yij ~ N(+i, 2)
• Fixed effect model: a levels have been
specifically chosen by the experimenter.
6
3.3 Analysis of the Fixed Effects
Model
• Interested in testing the equality of the a treatment
means, and E(yij) = i = + i, i = 1,2, …, a
H0: 1 = … = a v.s.
H1: i j, for at least one pair (i,
j)
• Constraint:
i
i
a
i 0
i
• H0: 1 = … = a =0 v.s. H1: i 0, for at least
one i
7
•
n
a
n
Notations: yi yij , y yij
j 1
i 1 j 1
yi yi / n, y y / N , N na
3.3.1 Decomposition of the Total Sum of Squares
•
•
Total variability into its component parts.
The total sum of squares (a measure of overall
variability in the data)
a
n
SST ( yij y.. )
2
i 1 j 1
•
Degree of freedom: an – 1 = N – 1
8
a
n
a
n
2
(
y
y
)
[(
y
y
)
(
y
y
)]
ij .. i. .. ij i.
2
i 1 j 1
i 1 j 1
a
a
n
n ( yi. y.. ) ( yij yi. ) 2
2
i 1
i 1 j 1
SST SSTreatments SS E
• SSTreatment: sum of squares of the differences
between the treatment averages (sum of squares
due to treatments) and the grand average, and a –
1 degree of freedom
• SSE: sum of squares of the differences of
observations within treatments from the treatment
average (sum of squares due to error), and N – a
degrees of freedom.
9
SST SSTreatments SSE
• A large value of SSTreatments reflects large differences
in treatment means
• A small value of SSTreatments likely indicates no
differences in treatment means
• dfTotal = dfTreatment + dfError
•
(n 1)S12 (n 1)S a2
SSE
N a
(n 1) (n 1)
•If there are no differences between a treatment
means,
n ( yi y ) 2
SSTreatments
a 1
i
a 1
10
SSTreatments
SSE
• Mean squares: MS
, MS E
Treatments
a 1
N a
a
n
a
1
1
E ( MS E )
E ( yij2 yi2 ) 2
N a i 1 j 1
n i 1
a
E ( MSTreatments ) 2 n( i ) /(a 1)
i 1
3.3.2 Statistical Analysis
• Assumption: ij are normally and independently
distributed with mean zero and variance 2
11
• SST/2 ~ Chi-square (N – 1), SSE/2 ~ Chi-square
(N – a), SSTreatments/2 ~ Chi-square (a – 1), and
SSE/2 and SSTreatments/2 are independent
(Theorem 3.1)
• H0: 1 = … = a =0 v.s. H1: i 0, for at least
one i
12
• Reject H0 if F0 > F, a-1, N-a
• Rewrite the sum of squares:
y2
SST y ij
N
i 1 j 1
a
n
1 a 2 y2
SSTreatments y i
n i 1
N
SSE SST SSTreatments
• See page 71
13
Response:Strength
ANOVA for Selected Factorial Model
Analysis of variance table [Partial sum of squares]
Sum of
Mean
F
Source Squares
DF
Square
Value Prob > F
Model 475.76
4
118.94
14.76 < 0.0001
A
475.76
4
118.94
14.76 < 0.0001
Pure Error161.20
20
8.06
Cor Total 636.96
24
Std. Dev. 2.84
Mean
15.04
C.V.
18.88
PRESS 251.88
R-Squared
Adj R-Squared
Pred R-Squared
Adeq Precision
0.7469
0.6963
0.6046
9.294
14
3.3.3 Estimation of the Model Parameters
• Model: yij = + i + ij
• Estimators: ˆ y
ˆi yi y
ˆ i yi
• Confidence intervals:
y i ~ N ( i , 2 / n)
y i t / 2 , N a
MS E
MS E
i y i t / 2 , N a
n
n
y i y j t / 2 , N a
MS E
MS E
i j y i y j t / 2 , N a
n
n
15
• Example 3.3 (page 75)
• Simultaneous Confidence Intervals (Bonferroni
method): Construct a set of r simultaneous
confidence intervals on treatment means which is
at least 100(1-): 100(1-/r) C.I.’s
3.3.4 Unbalanced Data
• Let ni observations be taken under treatment i,
i=1,2,…,a, N = i ni,
2
y
SST y ij2
N
i 1 j 1
a
ni
a
SSTreatments
i 1
y i2 y2
ni
N
16
1. The test statistic is relatively insensitive to small
departures from the assumption of equal variance
for the a treatments if the sample sizes are equal.
2. The power of the test is maximized if the samples
are of equal size.
17
3.4 Model Adequacy Checking
• Assumptions: yij ~ N(+i, 2)
• The examination of residuals
• Definition of residual:
eij yij yˆ ij ,
yˆ ij ˆ ˆi y ( yi y ) yi
• The residuals should be structureless.
18
th
3.4.1 The Normality Assumption
• Plot a histogram of the residuals
• Plot a normal probability plot of the residuals
• See Table 3-6
R es idual
-3.8
-1.55
0.7
2.95
5.2
ytilibaborp % la mr o N
1
5
10
20
30
50
70
80
90
95
99
19
• May be
– Slightly skewed (right tail is longer than left
tail)
– Light tail (the left tail of error is thinner than
the tail part of standard normal)
• Outliers
• The possible causes of outliers: calculations, data
coding, copy error,….
• Sometimes outliers are more informative than the
rest of the data.
20
• Detect outliers: Examine the standardized
eij
residuals,
d ij
MS E
3.4.2 Plot of Residuals in Time Sequence
• Plotting the residuals in time order of data
collection is helpful in detecting correlation
between the residuals.
• Independence assumption
21
SIGN-EXPERT Pl ot
ength
Residuals vs. Run
5.2
Residuals
2.95
0.7
-1.55
-3.8
1
4
7
10
13
16
19
22
25
Run Num ber
22
3.4.3 Plot of Residuals Versus Fitted Values
• Plot the residuals versus the fitted values
• Structureless
ESIGN-EXPERT Pl ot
rength
Residuals vs. Predicted
5.2
2.95
Residuals
2
2
0.7
2
2
-1.55
2
2
2
-3.8
9.80
12.75
15.70
Predicted
18.65
21.60
23
• Nonconstant variance: the variance of the
observations increases as the magnitude of the
observation increase, i.e. yij 2
• If the factor levels having the larger variance also
have small sample sizes, the actual type I error
rate is larger than anticipated.
• Variance-stabilizing transformation
Poisson
Square root transformation
Lognormal
Logarithmic transformation log yij
Binomial
Arcsin transformation arcsin
y ij
y ij
24
• Statistical Tests for Equality Variance:
H0 : 12 a2 v.s.H1 : abovenot truefor at least one i2
– Bartlett’s test:
q
c
02 2.3026
a
q ( N a) log S (ni 1) log S i2
2
P
i 1
1 a
1
1
c 1
(ni 1) ( N a)
3(a 1) i 1
a
S (ni 1) S i2 /( N a)
2
p
i 1
2
2
– Reject null hypothesis if 0 ,a1
25
• Example 3.4: the test statistic is
02 0.93and 02.05,4 9.49
• Bartlett’s test is sensitive to the normality
assumption
• The modified Levene test:
– Use the absolute deviation of the observation in
each treatment from the treatment median.
d ij y ij ~
y i , i 1,2,, a, j 1,2,, ni
– Mean deviations are equal => the variance of
the observations in all treatments will be the
same.
– The test statistic for Levene’s test is the
ANOVA F statistic for testing equality of
means.
26
• Example 3.5:
– Four methods of estimating flood flow
frequency procedure (see Table 3.7)
– ANOVA table (Table 3.8)
– The plot of residuals v.s. fitted values (Figure
3.7)
– Modified Levene’s test: F0 = 4.55 with P-value
= 0.0137. Reject the null hypothesis of equal
variances.
27
•
•
•
•
Let E(y) = and y
Find y* = y that yields a constant variance.
* +-1
Variance-Stabilizing Transformations
* and
= 1 -
Transformation
*constant
0
1
No transformation
* 1/2
½
½
Square root
*
1
0
Log
* 3/2
3/2
-1/2
Reciprocal square root
* 2
2
-1
Reciprocal
28
• How to find :
log yi log log i
• Use Si i and yi i
• See Figure 3.8, Table 3.10 and Figure 3.9
29
3.5 Practical Interpretation of
Results
• Conduct the experiment => perform the statistical
analysis => investigate the underlying
assumptions => draw practical conclusion
3.5.1 A Regression Model
• Qualitative factor: compare the difference between
the levels of the factors.
• Quantitative factor: develop an interpolation
equation for the response variable.
30
• RegressionDESIGN-EXPERT
analysis Plot
Strength
• See Figure 3.1
One Factor Plot
25
X = A: Cotton Weight %
Final Equation in Terms of
Design Points
Actual Factors:
20.5
2
2
Strength
Strength = +62.61143
-9.01143* Cotton Weight %
+0.48143 * Cotton Weight
%^2 -7.60000E-003 * Cotton
Weight %^3
2
2
16
2
11.5
2
This is an empirical model of the
experimental results
7
2
15.00
20.00
25.00
30.00
A: Cotton Weight %
35.00
31
3.5.2 Comparisons Among Treatment Means
• If that hypothesis is rejected, we don’t know
which specific means are different
• Determining which specific means differ
following an ANOVA is called the multiple
comparisons problem
3.5.3 Graphical Comparisons of Means
32
3.5.4 Contrast
• A contrast: a linear combination of the parameters
of the form
a
a
i 1
i 1
ci i , ci 0
• H0: = 0 v.s. H1: 0
• Two methods for this testing.
33
• The first method:
a
a
i 1
i 1
Let C ci y i T henVar (C ) n 2 ci2
a
c y
Under H 0 ,
i 1
i
i
a
~ N (0,1)
n 2 ci2
i 1
a
Hence thest at ist ic,t 0
c y
i 1
i
i
a
nMSE ci2
~ t N a
i 1
34
• The second method:
a
F0 t02
( ci y i ) 2
i 1
a
nMSE ci2
~F1,N a
i 1
a
ci y i
MS C SSC / 1
F0
, SSC i 1 a
MS E
MS E
n ci2
i 1
35
• The C.I. for a contrast,
a
ci i
i 1
σ2
Let C ci y i . T henVar(C)
n
i 1
a
a
HenceC.I. ci y i t / 2, N a
i 1
MS E
n
a
2
c
i
i 1
a
2
c
i
i 1
• Unequal Sample Size
ci y i
ci y i
i 1
3.SS C i a1
a
2
2
n
c
MS E ni ci
ii
a
a
a
1. ni ci 0 2. t 0
i 1
i 1
i 1
36
2
3.5.5 Orthogonal Contrast
• Two contrasts with coefficients, {ci} and {di}, are
orthogonal if ci di = 0
• For a treatments, the set of a – 1 orthogonal
contrasts partition the sum of squares due to
treatments into a – 1 independent single-degreeof-freedom components. Thus, tests performed on
orthogonal contrasts are independent.
• See Example 3.6 (Page 94)
37
3.5.6 Scheffe’s Method for Comparing All Contrasts
• Scheffe (1953) proposed a method for comparing
any and all possible contrasts between treatment
means.
Suppose u c1u 1 c au a , u 1,2, , m
a
C u ciu y i and S Cu MS E (ciu2 / ni )
i 1
i 1
T hecrit icalvalue : S ,u S Cu (a 1) F ,a 1, N a
If C u S ,u , thenreject H 0 : u 0
• See Page 95 and 96
38
3.5.7 Comparing Pairs of Treatment Means
• Compare all pairs of a treatment means
• Tukey’s Test:
– The studentized range statistic:
q
y max y min
MS E / n
, y max and y min are thelargest and smallest
samplemeansout of a group of p samplemeans
MS E
T hecriticalpointis T q (a, f )
n
or T q (a, f ) MS E (1 / ni 1 / n j )
– See Example 3.7
39
• Sometimes overall F test from ANOVA is
significant, but the pairwise comparison of mean
fails to reveal any significant differences.
• The F test is simultaneously considering all
possible contrasts involving the treatment means,
not just pairwise comparisons.
The Fisher Least Significant Difference (LSD)
Method
• For H0: i = j
t0
y i y j
MS E (1 / ni 1 / n j )
40
• The least significant difference (LSD):
LSD t / 2, N a
1
1
MS E
n n
j
i
• See Example 3.8
Duncan’s Multiple Range Test
• The a treatment averages are arranged in
ascending order, and the standard error of each
average is determined as
S yi
MS E
, nh
nh
a
a
1 / n
i 1
i
41
• Assume equal sample size, the significant ranges
are
RP r p, f S yi , p 2,3,, a
• Total a(a-1)/2 pairs
• Example 3.9
The Newman-Keuls Test
• Similar as Duncan’s multiple range test
• The critical values:
K P q ( p, f )S yi
42
3.5.8 Comparing Treatment Means with a Control
• Assume one of the treatments is a control, and the
analyst is interested in comparing each of the other
a – 1 treatment means with the control.
• Test H0: i = a v.s. H1: : i a, i = 1,2,…, a – 1
• Dunnett (1964)
• Compute
yi ya , i 1,2,, a 1
• Reject H0 if
y i y a
1
1
d (a 1, f ) MS E
ni n a
• Example 3.10
43
3.7 Determining Sample Size
• Determine the number of replicates to run
3.7.1 Operating Characteristic Curves (OC Curves)
• OC curves: a plot of type II error probability of a
statistical test,
1 PReject H 0 | H 0 is false
1 P( F0 F ,a 1, N a | H 0 is false)
44
• If H0 is false, then
F0 = MSTreatment / MSE ~ noncentral F
with degree of freedom a – 1 and N – a and
noncentrality parameter
• Chart V of the Appendix
a
• Determine
2
2
n i
i 1
a 2
• Let i be the specified treatments.
Then estimates
a
of i : i i , i / a
i 1
2
• For , from prior experience, a previous
experiment or a preliminary test or a judgment
45
estimate.
• Example 3.11
• Difficulty: How to select a set of treatment means
on which the sample size decision should be
based.
• Another approach: Select a sample size such that
if the difference between any two treatment means
exceeds a specified value the null hypothesis
should be rejected.
2
nD
2
a 2
46
3.7.2 Specifying a Standard Deviation Increase
• Let P be a percentage for increase in standard
deviation of an observation. Then
a
i 1
2
i
/a
/ n
1 0.01P
2
1 n
• For example (Page 110): If P = 20, then
1.2
2
1 n 0.66 n
47
3.7.3 Confidence Interval Estimation Method
• Use Confidence interval.
y i y j t / 2 , N a
MS E
MS E
i j y i y j t / 2 , N a
n
n
• For example: we want 95% C.I. on the difference
in mean tensile strength for any two cotton weight
percentages to be 5 psi and = 3. See Page 110.
48
3.9 The Regression Approach to the
Analysis of Variance
• Model: yij = + i + ij
2
•
a
n
a
n
L ij2 y ij i
i 1 j 1
i 1 j 1
L L
0, i 1,2, , a
i
y
a
n
i 1 j 1
ˆ ˆi 0 & y ij ˆ ˆi 0, i 1,2, , a
n
ij
j 1
49
• The normal equations
Nˆ
nˆ
nˆ
nˆ
nˆ1
nˆ1
nˆ2
nˆa
nˆ2
ˆ
i 1
i
nˆa
0
y
y1
y 2
y a
• Apply the constraint a
Then estimations are
ˆ y ,ˆi yi y
• Regression sum of squares (the reduction due to
fitting the full model)
2
a
a
y i
R( , ) ˆy ˆi yi
i 1
i 1 n
50
• The error sum of squares:
a
n
SSE yij2 R ,
i 1 j 1
• Find the sum of squares resulting from the
treatment effects:
R( | ) R( , ) R( )
R(Full Model) - R(Reduced Model)
2
y
y /n
N
i 1
2
i
51
• The testing statistic for H0: 1 = … = a
R( | ) /(a 1)
F0
~ Fa 1, N a
a n 2
yij R( , ) /( N a)
i 1 j 1
52