As mentioned, a correlation shows whether two variables are related (or not), how strongly, and in what way.
In statistical terms the relationship between variables is denoted by the correlation coefficient, which is a number between 0 and 1.0. Pearson’s r is the most common; the main ideas discussed here are similar for all correlation coefficients.
- If there is no relationship between the variables under investigation (or between the predicted values and the actual values), then the correlation coefficient is 0, or non-existent.
- As the strength of the relationship between the variables increases, so does the value of the correlation coefficient, with a value of 1 showing a perfect relationship. (As mentioned, in variables studied in educational research, or generally in social sciences, it is highly unlikely that such perfect correlations are found.)
In general, the higher the correlation coefficient, the stronger the relationship. The following tables present some rules of thumb.
Dancey and Reidy's (2004) categorisation:
|Value of the Correlation Coefficient||Strength of Correlation|
|0.7 - 0.9||Strong|
|0.4 - 0.6||Moderate|
|0.1 - 0.3||Weak|
|Value of the Correlation Coefficient||Strength of the Correlation|
|0.51 - 0.75||Medium|
|0.25 - 0.50||Low|
|Value of the Correlation Co-Efficient||Strength of the Correlation|
|0.8 - 0.9||Very Strong|
|0.5 - 0.8||Strong|
|0.3 - 0.5||Moderate|
|0.1 - 0.3||Modest|
Note: As mentioned these categorisation should be used as a rule of thumb. Interpretation also depends on the size of the sample and the use of statistical levels.
Variables can be correlated either positively or negatively. In order to indicate the direction of the correlation, we use a positive sign (+) in front of the correlation coefficient for positive correlations or a negative sign (-) for negative correlations. Some example scattergrams are presented below (Figures 5-8).
Figures 5 and 6
Perfect Positive and Negative Linear Correlation
Figures 7 and 8
Imperfect Positive and Negative Linear Correlations
Can you think of possible negative correlations? Some suggestions are age and eyesight or age and agility; the number of hours children spend watching TV and their exam results; and weight and hours spent exercising!
Correlation and Causation
Causation or causality in statistical terms means that variable A isn’t just correlated with variable B, but that it actually produces a change in B.
When conducting a correlation analysis, it is important to remember that we cannot claim that a relationship between variables is a “cause and effect” one. All we can say is that the two variables occur together, that changes in one is accompanied by systematic changes in the other. Causal inferences are made based on underlying theories and knowledge.
An interesting discussion is presented in this paper.
Criteria For Using Correlations and Common Pitfalls
Finally, for a short discussion of the requirements and some pitfalls if criteria are not satisfied of using correlations follow the link (under Assumptions).
You will notice some points about how a correlation coefficient can be misleading when the association is curvilinear or subject to ceiling effects. As an example of a relationship subject to ceiling effects, think of the relationship between height and age in a population where the age ranges from 1 year to ninety years old. The relationship rises from 1 to about 16 or so, and stays more or less constant thereafter. Correlations can also be affected by outliers (extreme scores) so it is useful to plot the data on a scatterplot first and check it out.