What Good Is A Correlation Coefficient? Now I know exactly what you are thinking; I always know what you are thinking. “All that freaken work and all those calculations just to end up with a single number: 0.89. How absurd is that?” Seems kind of like a waste of time, huh? Well, guess again! It is actually very cool! “Yeah, right!” you say. Now, just be quit and let me explain. First of all a correlation, that single number, can tell you the direction of a relationship. If your correlation coefficient is a negative number, you can tell, just by looking at it, that there is a negative relationship between the two variables. As you may recall, a negative relationship means that as values on one variable increase (go up) the values on the other variable tend to decrease (go down) in a predictable manner. If your correlation coefficient is a positive number, then you know that you have a positive relationship. This means that as one variable increases (or decreases) the values of the other variable tend to go in the same direction. If one increases, so does the other. If one decreases, so does the other in a predictable manner. Let me make this crystal clear for you. All correlation coefficients range from -1.00 to +1.00. A correlation coefficient of -1.00 tells you that there is a perfect negative relationship between the two variables. This means that as values on one variable increase there is a perfectly predictable decrease in values on the other variable. In other words, as one variable goes up, the other goes in the opposite direction (it goes down). A correlation coefficient of +1.00 tells you that there is a perfect positive relationship between the two variables. This means that as values on one variable increase there is a perfectly predictable increase in values on the other variable. In other words, as one variable goes up so does the other. A correlation coefficient of 0.00 tells you that there is a zero correlation, or no relationship, between the two variables. In other words, as one variable changes (goes up or down) you can’t really say anything about what happens to the other variable. Sometimes the other variable goes up and sometimes it goes down. However, these changes are not predictable. Most correlation coefficients (assuming there really is a relationship between the two variables you are examining) tend to be somewhat lower than plus or minus 1.00 (meaning that they are not perfect relationships) but are somewhat above 0.00. Remember that a correlation coefficient of 0.00 means that there is no relationship between your two variables based on the data you are looking at. The closer a correlation coefficient is to 0.00, the weaker the relationship is and the less able you are to tell exactly what happens to one variable based on knowledge of the other variable. The closer a correlation coefficient approaches plus or minus 1.00 the stronger the relationship is and the more accurately you are able to predict what happens to one variable based on the knowledge you have of the other variable. See the following, for a mathematical representation of what I am talking about. +.80 +.60 +.40 +.20 0

to +1.00 to +.80 to +.60 to +.40 to +.20

or or or or or

-.80 -.60 -.40 -.20 0

to -1.00 - high to -.80 - moderately high to -.60 - moderate to -.40 - low to -.20 - little or none

The reader should be wary in using any guidelines since they can not reflect the factors important for an individual situation. Guidelines do not consider the nature of the variables, the number of subjects included or the use to which the correlation coefficient will be put.

Let’s look at this another way. Suppose that you read a research study that reported a correlation of r = .75 between strength and speed. What could you say about the relationship between these two variables and what could you do with the information? You could tell, just by looking at the correlation coefficient that there is a positive relationship between strength and speed. This means that people with more strength tend to have greater speed. Similarly, people with low levels of strength tend to be slower speed wise. The relationship between strength and speed seems to be pretty strong. You can tell this because the correlation coefficient is much closer to 1.00 than it is to 0.00. Notice, however, that the correlation coefficient is not 1.00. Therefore, it is not a perfect relationship. As a result, if you were examining a coaches training practices, you could use the knowledge that strength is related speed to “model” the coaches training practices. This means that you can generate predictions of how fast an individual should be based on his or her level of strength. Let me give you another example. Let’s say that you are interested in finding out if body fat is related to cardiovascular fitness. You compute a correlation coefficient to evaluate the relationship between body fat and cardiovascular fitness and get an r = -.35. What could you say based on this correlation coefficient? You could tell, just by looking at the correlation coefficient that there is a negative relationship between body fat and cardiovascular fitness. This means that people with more body fat tend to have less cardiovascular fitness. Similarly, people with less body fat tend have greater cardiovascular fitness. In this example the relationship between body fat and cardiovascular fitness to be moderately strong. You can tell this because the correlation coefficient is about halfway between 0.00 and 1.00. Okay, so how would you use this information? Since you have established that a relationship exists between these two variables, you can use the knowledge that body fat and cardiovascular fitness has an inverse relationship to investigate further. What you would do is look at those individuals who had high body fat ratings to see if there is some valid reason for their poor cardiovascular performance (e.g., higher-workload, weaker heart, etc.) Finally, you could use this information to identify how much weight loss needs to be increased to eliminate the relationship. In addition to telling you A. Whether two variables are related to one another, B. Whether the relationship is positive or negative and C. How large the relationship is, The correlation coefficient tells you one more important bit of information--it tells you exactly how much variation in one variable is related to changes in the other variable. Many students who are

new to the concept of correlation coefficients make the mistake of thinking that a correlation coefficient is a percentage. They tend to think that when r = .90 it means that 90% of the changes in one variable are accounted for or related to the other variable. Even worse, some think that this means that any predictions you make will be 90% accurate. This is not correct! A correlation coefficient is a “ratio” not a percent. However it is very easy to translate the correlation coefficient into a percentage. All you have to do is “square the correlation coefficient” which means that you multiply it by itself. So, if the symbol for a correlation coefficient is “r”, then the symbol for this new statistic is simply “r2” which can be called “r squared”. There is a name for this new statistic—the Coefficient of Determination.

Coefficient of Determination Again, the coefficient of determination is simply r2, the square of the correlation coefficient. The coefficient of determination gives the proportion of variance in one variable that is associated with the variance of the other variable. For example, suppose the right boomerang and shuttle run yield an r =.85. The coefficient of determination (r2) then is .72. (.85 X .85=.72) We interpret this by stating that 72% of the variance in the shuttle run scores is explained by predicting from the variance in the right boomerang scores. In other words, some common factor, perhaps agility, accounts for 72% of the association between the two items. Now suppose the correlation between the right boomerang and the quadrant jump is .40. In this case, r2=.16 (.40 X .40= .16) so only 16% of the variance of the two tests is in common and 84% of the variance must be accounted for by other factors.

The Correlation Coefficient Critical Values of the Correlation Coefficient Two Tailed Test Type 1 Error Rate Degrees of Freedom 1 2 3 4 5 6 7 8

.05 0.997 0.950 0.878 0.811 0.754 0.707 0.666 0.632

.01 0.999 0.990 0.959 0.917 0.874 0.834 0.798 0.765

9 10 11 Etc…

0.602 0.576 0.553 Etc…

0.735 0.708 0.684 Etc…

.01 with 8 degrees of freedom. That value is 0.765, consequently correlation coefficient of .89 is significant at both the .05 level and the .01 level. 2. When your correlation coefficient is equal to or larger than the critical value, you can say it is “statistically significant”. Whenever you hear that a research finding is statistically significant, it tells you how much confidence you can have in those findings and you can tell just how much chance there is that they may be wrong! Several fallacies and limitations should be considered in interpreting the meaning of a coefficient of correlation. A correlation does not imply a cause effect relationship between the variables. Many people confuse correlation and causation. A high correlation does not necessarily mean that a change in one variable causes a change in the other. For example, suppose you found a high correlation (r = .80) between pushup scores and 100 yard dash scores. Do you think that doing well in pushups caused an individual to do well on the dash or vice versa? On the other hand, suppose there is a high positive correlation (r = .80) between the number of errors committed by baseball teams and the number of losses per team. Do you think there might be a causal relationship there? In other words, the criterion for determining causality should be the nature of the two variables involved rather than the coefficient itself. Let me give you a better example. I bet if you did a research study to determine if there is a relationship between individuals who carry lighters and lung cancer you would most likely get a pretty high relationship. If you don’t believe me go ahead and do the study. You will see that I am right…I am always right. Now, do you think that carrying a lighter causes cancer? NO! It is probably because people who carry lighters also smoke which causes lung cancer. These are examples of the post-hoc fallacy. They are symptoms of a different cause, and do not represent a cause-effect relationship. Although these coefficients are statistically genuine, and may be explained by other factors, it would be spurious analysis to interpret one as the cause of the other. Nor could any useful conclusions be drawn by correlating the decline in corporal punishment in schools and the increase of juvenile delinquency. Technological and social change may be underlying factors in this relationship. Before a causal relationship between variables can be accepted, it must be established by logical and empirical analysis. The process of computing the coefficient of correlation only quantifies the relationship that has been previously established. A relationship of a particular type may exist only within certain limits, and must be tested to determine whether it is linear or curvilinear. Logical analysis must also be applied to determine what other factors may be a significant part of the pattern.

THE END!...not quite …THERE IS MORE! Ascertaining Reliability, Validly and Objectivity Three additional uses of the correlation coefficient are common and should be mentioned here. 1. Reliability: The reliability of a test or a test item indicates how likely a student is to make about the same score if the test is repeated. It could be called the “repeatability” or “consistency” of the test. The most common way of determining the reliability of a test is the “test-retest” method. For this, a test is given to a group of subjects and then the same test, under the same conditions, is repeated at a later date, usually the next day. The scores from the first test are correlated with those of the re-test and the correlation coefficient obtained is

the reliability of the test for those conditions. For example, a teacher wanted to know the reliability of a softball throw for distance. The teacher administered the test to the class and repeated the test on the following day. In general, a reliability of .90 or greater is considered high and from .80 to .90, moderate. Reliability below .80 leaves some question as to whether the test will give consistent results when it is repeated. One point which should be noted is that reliability is computed for a test under a specific set of circumstances. A test which has been reported to be highly reliable can be an extremely unreliable measure if the test administration is not carefully done. On the other hand, through careful administration, the reliability of a test which has been reported to have only moderate reliability can often be improved. Remember, a reported reliability is for a particular type group under specific conditions. For example, knowing that a test has been found reliable for high school boys does not imply that it will be reliable for girls, boys of another age level, or even another group of high school boys under different conditions 2

Objectivity: A test is considered to be objective if different administrators can give it to the same set of students and get about the same results. The directions for administering and scoring the test are important factors in its objectivity. To determine the objectivity of a test, you have to have two different administrators or instructors give the test to the same group under similar conditions and compute a correlation coefficient between the two sets of test scores. Objectivity coefficients can be interpreted in about the same way as reliability coefficients.

3

Validity: The validity of a test is an indication of how well it measures what it is supposed to measure. It is sometimes called the “honesty” of the test. There are several ways commonly used to determine the validity of a test. We will talk about these methods in detail later on in the text.