Pearson�s r Show The Pearson's correlation coefficient varies between -1 and +1 where: r = 1 means the data is perfectly linear with a positive slope ( i.e., both variables tend to change in the same direction) r = -1 means the data is perfectly linear with a negative slope ( i.e., both variables tend to change in different directions) r = 0 means there is no linear association r > 0 < 5 means there is a weak association r > 5 < 8 means there is a moderate association r > 8 means there is a strong association The figure below shows some data sets and their correlation coefficients. The first data set has an r=0.996, the second has an r = -0.999 and the third has an r= -0.233 The formula for Pearson's r is: Scatterplots! Scatterplots! Scatterplots! Pearson's r is a numerical summary of the strength of the linear association between the variables. If the variables tend to go up and down together, the correlation coefficient will be positive. If the variables tend to go up and down in opposition with low values of one variable associated with high values of the other, the correlation coefficient will be negative. Trouble! 1. The correlation is 0 within the bulk of the data in the lower left-hand corner. The outlier in the upper right hand corner increases both means and makes the data lie predominantly in quadrants I and III. Check with the source of the data to see if the outlier might be in error. Errors like these often occur when a decimal point in both measurements is accidentally shifted to the right. Even if there is no explanation for the outlier, it should be set aside and the correlation coefficient or the remaining data should be calculated. The report must include a statement of the outlier's existence. It would be misleading to report the correlation based on all of the data because it wouldn't represent the behavior of the bulk of the data. As discussed below, correlation coefficients are appropriate only when data are obtained by drawing a random sample from a larger population. However, sometimes correlation coefficients are mistakenly calculated when the values one of the variables--X, say--are determined or constrained in advance by the investigator. In such cases, the message or the outlier may be real, namely, that over the full range of values, the two variables tend to increase and decrease together. It's poor study design to have the answer determined by a single observation and it places the analyst in an uncomfortable position. It demands that we assume thr association is roughly linear over the entire range and that the variability in Y will be no different for large X from what it is for small X. Unfortunately, once the study has been conducted, there isn't much that can be done about it. The outcome hinges on a single obsrevation. 2.�������� Similar to 1. Check the outlier to see if it is in error. If not, report the correlation coefficient for all points except the outlier along with the warning that the outlier occurred. Unlike case 1 where the outlier is an outlier in both dimensions, here the outlier has a reasonable Y value and only a slightly unreasonable X value. It often happens that observations are two-dimensional outliers. They are unremarkable when each response is viewed individually in its histogram and do not show any aberrant behavior until they are viewed in two dimensions. Also, unlike case 1 where the outlier increases the magnitude of correlation coefficient, here the magnitude is decreased. 3.�������� This sort of picture results when one variable is a component of the other, as in the case of (total energy intake, energy from fat). The correlation coefficient almost always has to be positive since increasing the total will tend to increase each component. In such cases, correlation coefficients are probably the wrong summaries to be using. The underlying research question should be reviewed 4.�������� The two nearly straight lines in the display may be the result of plotting the combined data from two identifiable groups. In might be as simple as one line corresponding to men, the other to women. It would be misleading to report the single correlation coefficient without comment, even if no explanation manifests itself. 5.�������� The correlation is zero within the two groups; the overall correlation of 0.7 is due to the differences between groups. Report that there are two groups and that the within group correlation is zero. In cases where the separation between the groups is greater, the comments from case 1 apply as well. It may be that the data are not a simple random sample from a larger population and the division between the two groups may be due to a conscious decision to exclude values in the middle of the range of X or Y. The correlation coefficient is an inappropriate summary of such data because its value is affected by the choice of X or Y values. 6.�������� What most researchers think of when a correlation of 0.7 is reported. 7.�������� A problem mentioned earlier. The correlation is not 1, yet the observations lie on a smooth curve. The correlation coefficient is 0.70 rather than 0 because here the curve is not symmetric. Higher values of Y tend to go with higher values of X. A correlation coefficient is an inappropriate numerical summary of this data. Either (i) derive an expression for the curve, (ii) transform the data so that the new variables have a linear relationship, or (iii) rethink the problem. 8.�������� This is similar to case 5, but with a twist. Again, there are two groups, and the separation between them produces the positive overall correlation. But, here, the within-group correlation is negative! I would do my best to find out why there are two groups and report the within group correlations. The moral of these displays is clear: ALWAYS LOOK AT THE SCATTERPLOTS! What is the weakest correlation coefficient?The strongest linear relationship is indicated by a correlation coefficient of -1 or 1. The weakest linear relationship is indicated by a correlation coefficient equal to 0.
How do you find the weakest correlation coefficient?2 Remember this handy rule: The closer the correlation is to 0, the weaker it is. The closer it is to +/-1, the stronger it is.
Is 1.0 A weak correlation?The sign of the linear correlation coefficient indicates the direction of the linear relationship between x and y. When r (the correlation coefficient) is near 1 or −1, the linear relationship is strong; when it is near 0, the linear relationship is weak.
Is 0.01 a weak correlation?Positive correlation is measured on a 0.1 to 1.0 scale. Weak positive correlation would be in the range of 0.1 to 0.3, moderate positive correlation from 0.3 to 0.5, and strong positive correlation from 0.5 to 1.0.
|