Understanding Inferential Statistics—Estimation
Types of Statistics
Research questions.
The type of data collected.
Audience who will receive the results.
Descriptive & Inferential Statistics
Statistical Methods
*
Inference Process
Population
Sample
Sample statistic (`X, Ps )
Estimation & Hypothesis testing
*
Point Estimating &Population Parameters
Population Parameters
µ = Population mean
σ = Population standard deviation
σ2 = Population variance
π = Population proportion
N = The size of the population you can generalize to
Sample Statistics (Point Estimates)
= Mean point Estimate
S = Standard deviation point estimate
S2 = Variance point estimate
P = Proportion point estimate
n = The size of a sample taken from a population
Population Parameter is Unknown
Sample
Statistics
*
Point Estimating &Population Parameters
Characteristic measures | Point estimates (Sample) | Parameters (Population) |
Mean | µ | |
Standard deviation | S | σ |
Variance | S2 | σ2 |
Proportion | P | π |
*
Point estimation involves the use of sample data to calculate a single value (known as a statistic) which is to serve as a “best guess” for an unknown population parameter.
Interval estimation is the use of sample data to calculate an interval of possible (or probable) values of an unknown population parameter.
*
Example 1:
The College Board reports that the scores on the 2010 SAT mathematics test were normally distributed. A sample of 25 scores had a mean of 510. Assume the population standard deviation is 100. Construct a 95% confidence interval for the population mean score on the 2010 SAT math test.
Interval Estimation of Population Mean
*For α = 0.05 (95% CI), we get Zα/2 = Z0.025 = 1.96.
Interpretation: We are 95% confidant that the population mean SAT score on the 2010 mathematics SAT test lies between 470.8 and 549.2
Solution:
n = 25, = 510, σ = 100
Example 2:
Estimate with 95% confidence interval the mean cholesterol level for freshman nursing students using a sample of 30 students who have an average cholesterol of 180mg/dl and a standard deviation of 34mg/dl.
Recall,
Note:
If σ is known:
If σ is unknown:
Thus:
d.f * = n-1 = 30-1=29
* Degrees of freedom (d.f) is the number of values that are free to vary when computing a statistic
Interpretation: we are 95% confidant that the freshman nursing students population mean cholesterol level is between 167.31 and 192.69
Solution:
= 180mg/dl, σ = unknown s = 34mg/dl n = 30
Effect of Increase in Sample size in Estimating Population Parameters
Example 3 a:
Estimate with 95% confidence interval the mean cholesterol level for freshman nursing students using a sample of 30 students who have an average cholesterol of 180mg/dl. Assume the population standard deviation to be 33 mg/dl.
Solution:
n = 30, = 180mg/dl, σ = 33mg/dl
Interpretation: we are 95% confident that the freshman nursing students population mean cholesterol level is between 168.19 and 191.81
Example 3 b:
Estimate with 95% confidence interval the mean cholesterol level for freshman nursing
students using a sample of 60 students who have an average cholesterol of 180mg/dl.
Assume the population standard deviation to be 33 mg/dl.
n = 60 = 180mg/dl σ = 33mg/dl
Interpretation: we are 95% confident that the freshman nursing students population mean cholesterol level is between 171.65 and 188.35
Effect of Increasing Sample Size in Estimating Population Parameters
Sample Size for Estimation
Get from literature
σ could also be estimated by Range/4 if the distribution is normal
OR
= error we are willing to accept (difference between point estimate and parameter)
For 95% CI, Z = 1.96
Example 4:
For freshman nursing students: estimate, with 95% confidence the minimum sample size needed to estimate their mean cholesterol to within 10 mg/dl.
A best estimate of σ is 33 mg/dl
Interpretation:
The minimum sample size needed to estimate their mean cholesterol to within 10 mg/dl is 42 subjects.
Interval Estimation of Population Proportion π
Example 5:
In a sample of n = 400 households, 80 households had participated in the recent elections. Estimate, with 95% confidence, the proportion of all households that will participate in the next election.
Solution:
Example 6:
If 50 out of 100 LLU students in a recent survey preferred alcohol free beverages, and you want to estimate the proportion, π, of LLU students who favor alcohol-free beverages, within ±3 percentage points 95% of the time, you would need a sample of ?? at least:
Sample Size for Estimation
Interpretation:
Therefore, we need at least 1068 subjects who favor alcohol free beverages to within 3 percentage points 95% of the time.
Solution:
/2
x
n
a
s
m
=±Z
x
100
5101.96
25
51039.2
(470.8,549.2)
m
m
m
=±
=±
=
/2
x
n
a
s
m
=±Z
/2
x
n
a
s
m
=±Z
/2
s
xt
n
a
m
=±
34
1802.045
30
18012.69
(167.31,192.69)
m
m
m
=±
=±
=
/2
S
xt
n
a
m
=±
33
1801.96
30
18011.81
(168.19,191.81)
m
m
m
=±
=±
=
33
1801.96
60
1808.35
(171.65,188.35)
m
m
m
=±
=±
=
/2
xZ
n
a
s
m
=±
2
÷
÷
ø
ö
ç
ç
è
æ
–
=
m
s
x
z
n
m
–
x
2
x
z
n
÷
÷
ø
ö
ç
ç
è
æ
m
–
s
=
(
)
(
)
2
1.9633
42
10
n
éù
éù
ëû
==
êú
êú
ëû
/2
(1)
pp
pZ
n
a
p
–
=±
0.2(0.8)
0.21.96
400
0.16
0.21.96
400
0.21.960.0004
0.21.96(0.02)
0.20.0392
(0.1608,0.2392)
p
=±
±
±
±
±
Þ
2
2
()
(1)
()
Z
n
p
=R-R
-R
2
2
()
(1)
()
Z
n
p
=R-R
-R
2
2
(1.96)3.8416
.5(.5).25().25(4268.44)1,067.11
(.03)0.0009
n
====
Hypothesis Testing (Statistical Significance)
1
Hypothesis Testing
Goal: Make statement(s) regarding unknown population parameter values based on sample data
Elements of a hypothesis test:
Null hypothesis – Statement regarding the value(s) of unknown parameter(s). Typically will imply no association between explanatory and response variables in our applications (will always contain an equality)
Alternative hypothesis – Statement contradictory to the null hypothesis (will always contain an inequality)
The level of significant (Alpha) is the maximum probability of committing a type I error. P(type I error)= alpha
Definitions
Rejection (alpha, α) Region:
Represents area under the curve that is used to reject the null hypothesis
Level of Confidence, 1 – alpha (a):
Also known as fail to reject (FTR) region
Represents area under the curve that is used to fail to reject the null hypothesis
FTR
H0
α/2
α/2
3
1 vs. 2 Sided Tests
Two-sided test
No a priori reason 1 group should have stronger effect
Used for most tests
Example
H0: μ1 = μ2
HA: μ1 ≠ μ2
One-sided test
Specific interest in only one direction
Not scientifically relevant/interesting if reverse situation true
Example
H0: μ1 ≤ μ2
HA: μ1 > μ2
4
Example: It is believed that the mean age of smokers in San Bernardino is 47. Researchers from LLU believe that the average age is different than 47.
Hypothesis
H0: μ = 47
HA: μ ≠ 47
μ = 47
α /2 = 0.025
Fail to Reject (FTR)
α /2 = 0.025
5
Three Approaches to Reject or Fail to Reject A Null Hypothesis:
1a. Confidence interval
Calculate the confidence interval
Decision Rule:
a. If the confidence interval (CI) includes the null, then the decision must be to fail to reject the H0.
b. If the confidence interval (CI) does not include the null, then the decision must be to reject the H0.
6
1b. Confidence interval to compare groups
Calculate the confidence interval for each group
Decision Rule:
a. If the confidence interval (CI) overlap, then the decision must be to fail to reject the H0.
b. If the confidence interval (CI) do not include the null, then the decision must be to reject the H0.
7
2.Test Statistic
Calculate the test statistic (TS)
Obtain the critical value (CV) from the reference table
Decision Rule:
a. If the test statistic is in the FTR region, then the decision must be to fail to reject the H0.
b. If the test statistic is in the rejection region, then the decision must be to reject the H0.
FTR
CV
TS
Since the test statistic is in the rejection region, reject the H0
FTR
CV
Since the test statistic is in the fail to reject region, fail to reject the H0
TS
CV
CV
8
3. P-Value
Choose α
Calculate value of test statistic from your data
Calculate P- value from test statistic
Decision Rule:
a. If the p-value is less than the level of significance, α, then the decision must be to reject H0.
b. If the p-value is greater than or equal to the level of significance ,α, then the decision must be to fail to reject H0.
FTR
CV
TS
FTR
CV TS
P-value
P-value
9
Types of Errors!
Types of Errors
Truth | |||
Hypothesis Testing | |||
Decision Based on a Random Sample | 1-α (Correct Decision) | Type II error (β) | |
Type I error (α) | 1-β ( Power) (Correct Decision) |
Fail to Reject H0
Reject H0
The Null Hypothesis
(H0) is True
The Null Hypothesis
(H0) is False
The level of significant (Alpha) is the maximum probability of committing a type I error. P(type I error)= alpha
11
FTR
CV
H0 is True
ts
Since the H0 is true and we decide to accept it, we have thus made a correct decision
Correct Decision
12
FTR
ts
CV
H0 is True
Since the H0 is true and we decide to reject it, we have thus made an incorrect decision leading to Type I error
Alpha (α) Error
13
ts
FTR
CV
H0 is False
Since the H0 is False and we decide to reject it, we have thus made a correct decision
Power
14
FTR
ts
CV
H0 is False
Since the H0 is False and we decide to accept it, we have thus made an incorrect decision leading to type II error.
Beta, β, Error
15
Null Hypothesis
True
Fail to reject
Reject
False
Reject
Correct Decision
Type I
Error
Fail to Reject
Type II
Error
Correct
Decision
How to Reduce Errors
Alpha error is reduced by increasing the confidence interval or reducing bias
Beta error is reduced by increasing the sample size
Alpha and beta are inversely related
Example
What type of error was possibly committed in the above example?
How would you reduce the error?
ANOVA
GROUPS
763.000
2
381.500
31.918
.000
251.000
21
11.952
1014.000
23
Between Groups
Within Groups
Total
Sum of
Squares
df
Mean Square
F
Sig.