# SPSS LAP Assignment

LAB 7:

Correlation and Linear Regression Analyses

1

The main purpose of this lab is to be able to use and correctly interpret the results of the following:

Scattergram (Scatter Plot)

Correlation analysis

Simple linear regression

Scatter plot

Is used to assess (visually) the linear relationship between two quantitative variables (linear or non-linear relationship)

Is used to explore the direction of the relationship between the two variables (positive, negative or no relationship)

It can also be used to explore the data values (if there is any outliers)

Introduction

See Chapter 5- section 5.3 for more details

2

Correlation Analysis:

The correlation coefficient (r) computed from the sample data measures the strength and the direction of a linear relationship between two variables.

The null hypothesis:

Ho : There is no linear relationship between the 2 variables

The alternative hypothesis:

Ha : There is a linear relationship between the 2 variables

The range of correlation coefficient is -1 to +1. When there is no linear relationship between

variables or only a weak relationship, the value of correlation coefficient will be close to 0.

No correlation: as x increases, no definite shift in y.

Positive correlation: as x increases, y increases.

Negative correlation: as x increases, y decreases

3

 0 to + 0.29: little or no association. +0.30 to + 0.49: weak positive association. + 0.5 to +0.69: medium positive association. +0.7to + 1.0: strong positive association. 0 to – 0.29: little or no association. – 0.30 to – 0.49: weak negative association. – 0.5 to – 0.69: medium negative association. – 0.7 to – 1.0: strong negative association.

Things to remember

Correlation coefficient cutoff points

Simple Linear Regression:

Is used to predict a single dependent variable (response) based on a single independent (predictor) variable

The null hypothesis:

Ho : The slope is zero; there is no linear relationship between the 2 variables

The alternative hypothesis:

Ha : The slope is not zero; there is a linear relationship between the 2 variables

For simple linear regression, coefficient of determination (R2) is the square of the correlation coefficient.

The range of coefficient of determination (R2) is 0 to +1. When there is no linear relationship between variables or only a weak relationship, the value of R squared will be close to 0.

5

a = y intercept (constant)

b = slope (regression coefficient) of line

y = dependent (predicted )variable

x = independent (predictor) variable

y = a + bx

Y

X

Assumptions:

Normality

Equal variances

Independence

Linear relationship

Regression analysis establishes a regression equation for predictions

For a given value of x, we can predict a value of y

Correlation

Is used to measure the strength and the direction of a linear association between two quantitative variables

Simple Linear Regression

Is used to predict a single dependent variable (response) based on a single independent (predictor) variable.

Scattergram (Scatter Plot)

Is used to assess (visually) the relationship between two quantitative variables (Linear vs. non- Linear).

Hypothetical Example

For

Simple Regression

Question : Is revision time variable a good predictor of exam performance?

Since we have two quantitative (not repeated) variables, simple linear regression should be used to answer this research question.

Step 1: Scatter plot

Dependent variable (response)? Exam performance

Independent variable (predictor)? Revision time

Can you see any relationship between revision time and exam performance? Positive or Negative?

To Obtain Scatter Plot:

Graphs Legacy Dialogs Scatter/Dot Select In the Scatterplot dialog Select the icon for simple scatter Select Define Select Exam performance as the Y-axis variable and Revision time as the X-axis variable Click Ok

9

Question 1: Is revision time variable a good predictor of exam performance? Continued:

To Obtain an Individual 95% Confidence Interval on the Scatter Plot:

After scattergram is drawn, do the following:

Double click on graph (which puts you in the chart editor window) Click on any one of the data points (this will highlight all the data points).

Click on the chart menu Select elements Fit line at total Then check individual 95% confidence interval Click apply and close properties box Close chart editor

10

Can you see any relationship between Revision Time and Exam Performance? Yes, Positive relationship

Step 2: Correlation and simple linear regression analysis

To Obtain Statistics for correlation and simple linear regression

Analyze Regression Linear Select exam performance as the dependent variable

Select revision time as the independent variable Click statistics and select estimates and 95% CI, also uncheck

Model fit and check Descriptives Click Ok

12

 Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate 1 .898a .806 .805 9.676 a. Predictors: (Constant), Revision Time b. Dependent Variable: Exam Performance
 Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. B Std. Error Beta 1 (Constant) – 207.810 8.365 -24.843 .000 Revision Tme 5.356 .128 .898 41.952 .000 a. Dependent Variable: Exam Performance

The Regression Equation:

Correlation Coefficient

Coefficient of determination (R2)

Answers the question of whether there is a significant linear relationship or not

Slope

Intercept (constant)

Revision

Time

Exam

Performance

R-square=0.806

% 80.6 of variation in exam perf. (dependent) variable explained by revision time (independent) variable

Results:

Correlation Analysis

There is a significant strong positive linear relationship between revision time and exam performance (Evidence: Correlation Coefficient (r) = 0.898)

Regression Analysis

80.6 % of the variation in exam performance is explained by revision time

Regression equation: exam performance= -207.81 + 5.36 * revision time

Conclusion: Revision time is a useful predictor of exam performance

0

20

40

60

80

100

120

140

160

180

200

050100150200250

Exam Performance=*(Revision Time)

Exam Performance207.815.356*(Revision Ti

me)

YabX

ab

=+

+

=-+

Revision time5.360.13<0.001

R

2

= 0.806

Variableβ*

Std. ErrorP-value

Intercept = – 207.81

Dependent Variable: Exam Performance

## Sheet1

 Model Summaryc Model R R Square Adjusted R Square Std. Error of the Estimate Change Statistics R Square Change F Change df1 df2 Sig. F Change Basic 1 .339a .115 .104 9.576 .115 10.316 3 238 .000 Smoking 2 .365b .133 .119 9.496 .018 5.018 1 237 .026 14.3153077371 a. Predictors: (Constant), GENDER, BMI, age -100 b. Predictors: (Constant), GENDER, BMI, age, EVER SMOKE CIGARETTES -100 c. Dependent Variable: BASELINE DIASTOLIC BLOOD PRESSURE -100 -100 -100 Model Summaryc -100 Model R R Square Adjusted R Square Std. Error of the Estimate Change Statistics ERROR:#VALUE! R Square Change F Change df1 df2 Sig. F Change -100 1 .339a .115 .104 9.595 .115 10.243 3 237 .000 -0.3339800938 Stress 2 .345b .119 .100 9.613 .004 .559 2 235 .573 -3.5583878252 a. Predictors: (Constant), GENDER, BMI, age -100 b. Predictors: (Constant), GENDER, BMI, age, StresAverage, StressLow -100 c. Dependent Variable: BASELINE DIASTOLIC BLOOD PRESSURE -100 -100 -100 Model Summaryc -100 Model R R Square Adjusted R Square Std. Error of the Estimate Change Statistics ERROR:#VALUE! R Square Change F Change df1 df2 Sig. F Change -100 1 .339a .115 .104 9.576 .115 10.316 3 238 .000 0 Exercise 2 .381b .145 .123 9.471 .030 2.769 3 235 .042 18.8063330466 a. Predictors: (Constant), GENDER, BMI, age -100 b. Predictors: (Constant), GENDER, BMI, age, Exer4, Exer3, Exer2 -100 c. Dependent Variable: BASELINE DIASTOLIC BLOOD PRESSURE -100 -100 -100 Model Summaryc -100 Model R R Square Adjusted R Square Std. Error of the Estimate Change Statistics ERROR:#VALUE! R Square Change F Change df1 df2 Sig. F Change -100 1 .339a .115 .104 9.576 .115 10.316 3 238 .000 0 Cholestrol 2 .354b .125 .110 9.541 .010 2.745 1 237 .099 6.278301257 a. Predictors: (Constant), GENDER, BMI, age b. Predictors: (Constant), GENDER, BMI, age, BASELINE CHOLESTEROL c. Dependent Variable: BASELINE DIASTOLIC BLOOD PRESSURE .092 23.9130434783 .114

## Descriptive

 Summary Table – General characteristics of study participants Characteristics Study Participants N* 242 Diastolic blood pressure, mm Hg a 84.2±10.1 Body mass indexa 30.2±6.3 Age, year a 55.2±11.2 Cholesterol 232.1±43.3 Gender [N, %] Male 83 (34.3) Female 159 (65.7) Smoking [N, %] Never 157 (64.9) Ever 85 (35.1) Self-Identified Race [N, %] None 99 (40.9) Mild 75 (31.0) Moderate 58 (24.0) Vigorous 10 (04.1) Stress [N, %] High 128 (53.1) Average 85 (35.3) Low 28 (11.6) a Value are means± SD * Based on the total number of subjects in the final model.

## Multiple Linear Regression

 Variable β* Std. Error P-value Revision time 5.36 0.13 <0.001 Dependent Variable: Exam Performance Intercept = – 207.81 R2 = 0.806

# LAB 7 SIMPLE LINEAR REGRESSION / CORRELATION ANALYSIS

Name __________________

Objectives: Be able to correctly use and interpret a: 1. Scattergram (scatterplot)

2. Correlation Analysis

3. Regression analysis

SCATTERGRAM is a good way to get a feel for any relationship which may exist between two different quantitative variables. OUTLIERS can also often be spotted with a Scattergram.

1. In the dataset Assignment_7_SP15.sav (a part of CORN1 dataset), examine a Scattergram between weight at baseline and height.

# To Obtain Scatterplots

Graphs

Legacy Dialogs

Scatter/Dot

· In the Scatterplot dialog, select the icon for simple scatter.

· Select Define.

· Select weight at baseline as the Y-axis variable and height as the X-axis variable

· Click ok

After scattergram is drawn, do the following:

· Double click on graph (which puts you in the chart editor window)

· Click on any one of the data points (this will highlight all the data points)

Select elements

Fit line at total

· Then check individual 95% confidence interval.

· Click apply and close properties box

· Close chart editor

a) Dependent variable? _________________

b) Independent variable? ________________

c) Can you see any relationship between height and weight? Positive or Negative?

Note: Correlation and regression analysis are run at the same time

2. CORRELATION ANALYSIS can give us a more complete picture of the LINEAR relationship between these two variables.

a) H0: ________________________________________________________________________________________________________________________________________________________________________________________________________________________

## To Obtain Statistics for a Linear Regression

Analyze

Regression

Linear…

· Select weight as the dependent variable and height as the independent variable.

b) What is the CORRELATION COEFFICIENT (r) = _______? This is a measure of the strength of the relationship between the dependent variable and the independent variables.

c) Is there a strong relationship between weight and height? _____ Evidence____ ?

3. REGRESSION ANALYSIS is used to predict the dependent variable based on the independent variable.

a) REGRESSION COEFFICIENT (unstandardized B)= ________? This is the slope of the regression line.

b) CONSTANT = __________? This is the point at which the regression line crosses the Y axis (also called “intercept”).

c) Write the equation of the REGRESSION LINE which best relates weight and height.

Recall y = a+ bx

d) Use the above equation to predict your own weight.

e) What percent of the variation in weight is explained by height? Explain possible reasons for its accuracy (or inaccuracy) at predicting your weight.

 Variable β* Std. Error P-value Height Dependent variable: Weight at Baseline Intercept = R2 =

Conclusion:

Basic features
• Free title page and bibliography
• Unlimited revisions
• Plagiarism-free guarantee
• Money-back guarantee