# Stat Quiz

Understanding Inferential Statistics—Estimation

Types of Statistics

• The choice of a type of analysis is based on:

Research questions.

The type of data collected.

Audience who will receive the results.

Descriptive & Inferential Statistics

Statistical Methods

*

Inference Process

Population

Sample

Sample statistic (`X, Ps )

Estimation & Hypothesis testing

*

Point Estimating &Population Parameters

Population Parameters

µ = Population mean

σ = Population standard deviation

σ2 = Population variance

π = Population proportion

N = The size of the population you can generalize to

Sample Statistics (Point Estimates)

= Mean point Estimate

S = Standard deviation point estimate

S2 = Variance point estimate

P = Proportion point estimate

n = The size of a sample taken from a population

Population Parameter is Unknown

Sample

Statistics

• Population Parameters are usually represented by Greek letters
• Point estimates are usually represented by Roman letters

*

Point Estimating &Population Parameters

 Characteristic measures Point estimates (Sample) Parameters (Population) Mean µ Standard deviation S σ Variance S2 σ2 Proportion P π

• Population Parameters are usually represented by Greek letters
• Point estimates are usually represented by Roman letters

*

Point estimation involves the use of sample data to calculate a single value (known as a statistic) which is to serve as a “best guess” for an unknown population parameter.

Interval estimation is the use of sample data to calculate an interval of possible (or probable) values of an unknown population parameter.

*

Example 1:

The College Board reports that the scores on the 2010 SAT mathematics test were normally distributed. A sample of 25 scores had a mean of 510. Assume the population standard deviation is 100. Construct a 95% confidence interval for the population mean score on the 2010 SAT math test.

Interval Estimation of Population Mean

*For α = 0.05 (95% CI), we get Zα/2 = Z0.025 = 1.96.

• Interval Estimation of Population Mean (µ) with Known Variance (σ Known)

Interpretation: We are 95% confidant that the population mean SAT score on the 2010 mathematics SAT test lies between 470.8 and 549.2

Solution:

n = 25, = 510, σ = 100

Example 2:

Estimate with 95% confidence interval the mean cholesterol level for freshman nursing students using a sample of 30 students who have an average cholesterol of 180mg/dl and a standard deviation of 34mg/dl.

• Interval Estimation of Population Mean (µ) with Unknown Variance (σ Unknown)

Recall,

Note:

• Since σ ( Standard deviation of the population) is unknown, we will use s (standard deviation of the sample) in place of σ.
• When s is used instead of σ, an error is introduced because s is only an estimate of σ.
• We will substitute the Z value with a another value called the student’s t or just t to account for this additional error.

If σ is known:

If σ is unknown:

Thus:

d.f * = n-1 = 30-1=29

* Degrees of freedom (d.f) is the number of values that are free to vary when computing a statistic

Interpretation: we are 95% confidant that the freshman nursing students population mean cholesterol level is between 167.31 and 192.69

Solution:

= 180mg/dl, σ = unknown s = 34mg/dl n = 30

Effect of Increase in Sample size in Estimating Population Parameters

Example 3 a:

Estimate with 95% confidence interval the mean cholesterol level for freshman nursing students using a sample of 30 students who have an average cholesterol of 180mg/dl. Assume the population standard deviation to be 33 mg/dl.

Solution:

n = 30, = 180mg/dl, σ = 33mg/dl

Interpretation: we are 95% confident that the freshman nursing students population mean cholesterol level is between 168.19 and 191.81

Example 3 b:

Estimate with 95% confidence interval the mean cholesterol level for freshman nursing

students using a sample of 60 students who have an average cholesterol of 180mg/dl.

Assume the population standard deviation to be 33 mg/dl.

n = 60 = 180mg/dl σ = 33mg/dl

Interpretation: we are 95% confident that the freshman nursing students population mean cholesterol level is between 171.65 and 188.35

Effect of Increasing Sample Size in Estimating Population Parameters

• Using a sample size of 30 the 95% confidence interval is 168.19 and 191.81
• Using a sample size of 60 the 95% confidence interval is 171.65 and 188.35
• Since the confidence interval using a larger sample size is more narrow then it is more precise in estimating the population mean than using a small sample size.

Sample Size for Estimation

Get from literature

σ could also be estimated by Range/4 if the distribution is normal

OR

= error we are willing to accept (difference between point estimate and parameter)

For 95% CI, Z = 1.96

Example 4:

For freshman nursing students: estimate, with 95% confidence the minimum sample size needed to estimate their mean cholesterol to within 10 mg/dl.

A best estimate of σ is 33 mg/dl

Interpretation:

The minimum sample size needed to estimate their mean cholesterol to within 10 mg/dl is 42 subjects.

Interval Estimation of Population Proportion π

Example 5:

In a sample of n = 400 households, 80 households had participated in the recent elections. Estimate, with 95% confidence, the proportion of all households that will participate in the next election.

Solution:

Example 6:

If 50 out of 100 LLU students in a recent survey preferred alcohol free beverages, and you want to estimate the proportion, π, of LLU students who favor alcohol-free beverages, within ±3 percentage points 95% of the time, you would need a sample of ?? at least:

Sample Size for Estimation

Interpretation:

Therefore, we need at least 1068 subjects who favor alcohol free beverages to within 3 percentage points 95% of the time.

Solution:

/2

x

n

a

s

m

=±Z

x

100

5101.96

25

51039.2

(470.8,549.2)

m

m

m

=

/2

x

n

a

s

m

=±Z

/2

x

n

a

s

m

=±Z

/2

s

xt

n

a

m

34

1802.045

30

18012.69

(167.31,192.69)

m

m

m

=

/2

S

xt

n

a

m

33

1801.96

30

18011.81

(168.19,191.81)

m

m

m

=

33

1801.96

60

1808.35

(171.65,188.35)

m

m

m

=

/2

xZ

n

a

s

m

2

÷

÷

ø

ö

ç

ç

è

æ

=

m

s

x

z

n

m

x

2

x

z

n

÷

÷

ø

ö

ç

ç

è

æ

m

s

=

(

)

(

)

2

1.9633

42

10

n

éù

éù

ëû

==

êú

êú

ëû

/2

(1)

pp

pZ

n

a

p

0.2(0.8)

0.21.96

400

0.16

0.21.96

400

0.21.960.0004

0.21.96(0.02)

0.20.0392

(0.1608,0.2392)

p

±

±

±

±

Þ

2

2

()

(1)

()

Z

n

p

=R-R

-R

2

2

()

(1)

()

Z

n

p

=R-R

-R

2

2

(1.96)3.8416

.5(.5).25().25(4268.44)1,067.11

(.03)0.0009

n

====

Hypothesis Testing (Statistical Significance)

1

Hypothesis Testing

Goal: Make statement(s) regarding unknown population parameter values based on sample data

Elements of a hypothesis test:

Null hypothesis – Statement regarding the value(s) of unknown parameter(s). Typically will imply no association between explanatory and response variables in our applications (will always contain an equality)

Alternative hypothesis – Statement contradictory to the null hypothesis (will always contain an inequality)

The level of significant (Alpha) is the maximum probability of committing a type I error. P(type I error)= alpha

Definitions

Rejection (alpha, α) Region:

Represents area under the curve that is used to reject the null hypothesis

Level of Confidence, 1 – alpha (a):

Also known as fail to reject (FTR) region

Represents area under the curve that is used to fail to reject the null hypothesis

FTR

H0

α/2

α/2

3

1 vs. 2 Sided Tests

Two-sided test

No a priori reason 1 group should have stronger effect

Used for most tests

Example

H0: μ1 = μ2

HA: μ1 ≠ μ2

One-sided test

Specific interest in only one direction

Not scientifically relevant/interesting if reverse situation true

Example

H0: μ1 ≤ μ2

HA: μ1 > μ2

4

Example: It is believed that the mean age of smokers in San Bernardino is 47. Researchers from LLU believe that the average age is different than 47.

Hypothesis

H0: μ = 47

HA: μ ≠ 47

μ = 47

α /2 = 0.025

Fail to Reject (FTR)

α /2 = 0.025

5

Three Approaches to Reject or Fail to Reject A Null Hypothesis:

1a. Confidence interval

Calculate the confidence interval

Decision Rule:

a. If the confidence interval (CI) includes the null, then the decision must be to fail to reject the H0.

b. If the confidence interval (CI) does not include the null, then the decision must be to reject the H0.

6

1b. Confidence interval to compare groups

Calculate the confidence interval for each group

Decision Rule:

a. If the confidence interval (CI) overlap, then the decision must be to fail to reject the H0.

b. If the confidence interval (CI) do not include the null, then the decision must be to reject the H0.

7

2.Test Statistic

Calculate the test statistic (TS)

Obtain the critical value (CV) from the reference table

Decision Rule:

a. If the test statistic is in the FTR region, then the decision must be to fail to reject the H0.

b. If the test statistic is in the rejection region, then the decision must be to reject the H0.

FTR

CV

TS

Since the test statistic is in the rejection region, reject the H0

FTR

CV

Since the test statistic is in the fail to reject region, fail to reject the H0

TS

CV

CV

8

3. P-Value

Choose α

Calculate value of test statistic from your data

Calculate P- value from test statistic

Decision Rule:

a. If the p-value is less than the level of significance, α, then the decision must be to reject H0.

b. If the p-value is greater than or equal to the level of significance ,α, then the decision must be to fail to reject H0.

FTR

CV

TS

FTR

CV TS

P-value

P-value

9

Types of Errors!

Types of Errors

 Truth Hypothesis Testing Decision Based on a Random Sample 1-α (Correct Decision) Type II error (β) Type I error (α) 1-β ( Power) (Correct Decision)

Fail to Reject H0

Reject H0

The Null Hypothesis

(H0) is True

The Null Hypothesis

(H0) is False

The level of significant (Alpha) is the maximum probability of committing a type I error. P(type I error)= alpha

11

FTR

CV

H0 is True

ts

Since the H0 is true and we decide to accept it, we have thus made a correct decision

Correct Decision

12

FTR

ts

CV

H0 is True

Since the H0 is true and we decide to reject it, we have thus made an incorrect decision leading to Type I error

Alpha (α) Error

13

ts

FTR

CV

H0 is False

Since the H0 is False and we decide to reject it, we have thus made a correct decision

Power

14

FTR

ts

CV

H0 is False

Since the H0 is False and we decide to accept it, we have thus made an incorrect decision leading to type II error.

Beta, β, Error

15

Null Hypothesis

True

Fail to reject

Reject

False

Reject

Correct Decision

Type I

Error

Fail to Reject

Type II

Error

Correct

Decision

How to Reduce Errors

Alpha error is reduced by increasing the confidence interval or reducing bias

Beta error is reduced by increasing the sample size

Alpha and beta are inversely related

Example

What type of error was possibly committed in the above example?

How would you reduce the error?

ANOVA

GROUPS

763.000

2

381.500

31.918

.000

251.000

21

11.952

1014.000

23

Between Groups

Within Groups

Total

Sum of

Squares

df

Mean Square

F

Sig.

Basic features
• Free title page and bibliography
• Unlimited revisions
• Plagiarism-free guarantee
• Money-back guarantee