If a consistent and systematic relationship is not present between two variables, then

Correlation

Last time we talked about two goals of doing survey research. One goal is simply to describe the people we measured.

Now, today we are going to talk about how we determine if there is a relationship among variables. That is, how do we determine if two variables are related to each other.

Exploring relationships among variables

Up until now, every test we've looked at tested whether there were differences between groups. We were comparing several groups to determine if the differences between groups were due to chance or not.

Now suppose that instead of dividing people up into groups, we just measure a sample of people on several variables (e.g., Family Values, Political Affiliation) using a survey. Now we are asking a different question: How are these variables related (or associated) with each other.

Correlation analysis

Statistical analysis used to explore relationships among two variables.

1. correlation coefficient (symbolized by "r")

Pearson Product-Moment Correlation

tells us

a. what kind of relationship we have between two variables

scatterplot of two variables.

1. Positive (direct)

If a consistent and systematic relationship is not present between two variables, then

As one variable increases, so does another

2. Negative (inverse)

If a consistent and systematic relationship is not present between two variables, then

As one variable increases, the other decreases

3. Independent (no relationship)

If a consistent and systematic relationship is not present between two variables, then

No systematic relationship among the variables

Note: The sign (+ or -) or r tells us the directionof the relationship (positive or negative), but not how strong the relationship is.

b. how strong that relationship is is determined by the magnitude of the coefficient.

Range of r is between -1.0 and +1.0.

1. The further away from zero (in a positive or negative direction), the stronger the relationship.

2. A perfect relationship would be +1.0 or -1.0

+ 1 means that every single time X increases, Y must increase

or

- 1 means that every single time X increases, Y must decrease

In sum, each correlation has two components:

Direction (look at sign + or -)

Magnitude (look at absolute value of r for strength of association).

2. Is relationship significant (not due to chance)

We could just look at our correlation coefficient and talk about its direction and strength, but how do we know that the relationship didn't just occur by chance? Just like all the other stats we've been doing, we have to test whether or not the relationship, no matter how strong it appears, is due to chance.

Steps involved in a correlation analysis

a. State hypothesis

Null hypotheses: Ho: = 0

Research hypotheses: Ha: # 0

b. Calculate the correlation coefficient.

You don't need to know how to do this. But if you are interested I can show you.

Conceptually, what it means is

r = actual amount of change shared between two variables/maximum amount of shared change

c. Calculate the d.f.

d.f. = N - 2

Where N represents the number of paired items.

d. Look up the critical value in the table.

If the value of r is greater than the critical value at p <. 05 or p < .01 then we can reject the null hypothesis. In other words, we assume that the relationship between the two variables is not due to chance - it is a significant relationship.

e. If relationship is significant, then look at the direction (positive or negative) and the magnitude.

Some guidelines for judging how strong (substantial) a relationship is.

Guilford suggests the following:

< .20 Slight, negligible relationship
.20-.40 Low: Definite but small relationship
.40-.70 Moderate relationship
.70-.90 High: Marked relationship
> .90 Very high, very dependable relationship

Variation on correlational research

Sometimes we want to know how two variables together are related to a third.

For instance, I might want to know how your education and work experience, combined, are related to your income.

1. Multiple correlation (symbolized by R)

When you want to compare the relationship between two variables with another variable.

In other words, we can assess how two (or more) variables together relate to a given variable.

Want to know how two variables are related to another.

Examples:

R income* education experience

The combined relationship between education and work experience with income.

Imagine that the circle for income represents the amount of variation or change in income that is shared with change in education and work experience.

If a consistent and systematic relationship is not present between two variables, then

What area does this relationship represent?

D, E, & F

Example:

R experience * income education

The combined relationship between income and education with work experience.

If a consistent and systematic relationship is not present between two variables, then

What area does this relationship represent?

B, E, & F

2. Partial Correlation (symbolized by R)

When you want to see what the relationship would be between just two of the variables, if you took away the influence of a third variable.

In other words, you simply want to know the unique relationship between two variables - removing the relationship of a third variable.

For example, I have two friends that talk to each other all of the time when I get them together, but I want to know how much they talk when I am not around.

Say for example, I want to know how education is related to income, taking away the influence of work experience. That is, what is the unique relationship between education and income, removing the influence of work experience.

In other words, how much strongly is education really associated with income when you take out other influences.

Example:

R studying GPA * intelligence

The relationship between studying and getting good grades (GPA), removing the influence of being intelligent.

In other words, to what degree is studying really related to grades (GPA), when you remove or wipe out the influence of being intelligent.

If a consistent and systematic relationship is not present between two variables, then

What area does this represent?

D

Example:

R intelligence GPA * studying

The relationship between intelligence and getting good grades(GPA), removing the influence of studying.

In other words, what is the unique relationship between being intelligent and getting good grades (GPA), taking out the influence of studying.

If a consistent and systematic relationship is not present between two variables, then

What area does this represent?

F

Hint for not confusing multiple and partial correlation symbols: If the lone letter is on the outside, it's being partialled out.

R intelligence GPA o studying (studying is being partialled out)

Next Lecture

Back to Lectures Page

What is the relationship between the correlation coefficient and the coefficient of determination?

Coefficient of correlation is “R” value which is given in the summary table in the Regression output. R square is also called coefficient of determination. Multiply R times R to get the R square value. In other words Coefficient of Determination is the square of Coefficeint of Correlation.

Is a statistical technique that uses information about the relationship between an independent or predictor variable and a dependent variable to make predictions?

A regression is a statistical technique that relates a dependent variable to one or more independent (explanatory) variables. A regression model is able to show whether changes observed in the dependent variable are associated with changes in one or more of the explanatory variables.

Which of the following are assumptions made by researchers when they calculate the Pearson correlation coefficient?

The assumptions are as follows: level of measurement, related pairs, absence of outliers, and linearity. Level of measurement refers to each variable. For a Pearson correlation, each variable should be continuous.

Which of the following inferential statistics tests can be used to determine if a relationship exists between two nominal categorical variables quizlet?

The chi-square test of independence is used with categorical data to determine whether two variables are related.