Correlation

[|Statistics Help Online] provides a simple set of examples of correlation:

With two completely uncorrelated variables, you get a correlation coefficient of 0. If they're perfectly correlated, then you get a correlation coefficient of 1. If they're perfectly anti-correlated, you get a correlation coefficient of -1.

Let's take an example. Take //**x**// to be the weight of Swedish men, and **//y//** to be their height. You'd expect there to be a correlation between the two variables, because there aren't going to be many 7 foot tall men weighing 110 lbs. But there are probably some 5 foor 2 inch men weighing that. This can be represented graphically. You take 20 Swedish men and plot weight versus height for each one (this isn't real data)



In this case there is a correlation between the two, but it's far from perfect. You expect your R for this to be somewhat less than 1. But lets now take //**x**// to be the time since a lawn has been mowed, and **//y//** to be the height of grass.



In this case the two are very highly correlated, and you expect a correlation coefficient close to 1.

=Negative Correlation=

[|Answers.com provides some examples of negative correlation:]

In a negative correlation, as the values of one of the variables increase, the values of the second variable decrease. Likewise, as the value of one of the variables decreases, the value of the other variable increases. This is still a correlation. It is like an "inverse" correlation. The word "negative" is a label that shows the direction of the correlation.

1. There is probably a negative correlation between TV viewing and class grades. We would probably find that students who spend more time watching TV tend to have lower grades (or the same correlation can be phrased as students with higher grades tend to spend less time watching TV).

2. Education and years in jail-people who have more years of education tend to have fewer years in jail (or phrased as people with more years in jail tend to have fewer years of education)

3. Crying and being held-among babies, those who are held more tend to cry less (or phrased as babies who are held less tend to cry more)

=Correlation vs. Causation=

[|Statistics Help Online] has some great examples to remind us that just because there's a correlation, that does not mean there is causation.

Smoking causes lung cancer. That's hardly a controversial statement anymore. But how do you know that people that get addicted to smoking don't have a genetic difference that predisposes them for lung cancer? You'd have to do an experiment to control for that or come up with a clear medical explanation of the carcinogenic effects of smoking. Of course that can be done, and no one is going to take the genetic argument seriously.

But it does raise a serious question about how to use correlation measurements to draw inferences about things effecting each other. In the example of smoking and cancer, you can define a variable //**S**// which equals the number of cigarettes an individual smokes a day, and a variable //**L**// which is whether they contracted lung cancer or not, then solve for the correlation //**R (S, L).**// I'm sure you'll find it's quite positive. But then you've shown that these two quantities are correlated not that one causes the other.

Let's take some other ludicrous examples to explain the problem of correlation vs. causation. Define //**T**// as the temperature of a day in Manhattan, and //**V**// as the number of ice cream vendors out on that day. The correlation coefficient between these two is almost certainly quite positive. (How many vendors are out there in January?). Does this prove that ice cream vendors cause it to be hot? Obviously causation goes the other way. Common sense tells you that. Unless of course you believe in conspiracy theories.

How about anti-baldness lotion. Define //**L**// as the amount of baldness lotion applied to a scalp. And //**B**// as the degree of baldness (1 completely bald, 0 full head of hair). You'd expect the correlation coefficient between these two also to be highly positive. But does that imply that this anti-baldness medication causes baldness?

Same with diet food. Diet food being defined here, not as lettuce but those premade meals you find in the frozen section with "diet" or "low fat" written all over them. I bet you'll find that people that eat diet food tend to be fatter than those that don't. Define //**F**// as the number of pounds of diet food consumed in a week. Define //**W**// as the weight of the person. I'm not sure about this, but it makes sense that these would be positively correlated, that is //**R (F, W)**// is quite positive. Most people I know eating diet food (aside from diet cokes) are not skinny.

The upshot of all this is that causation and correlation are very different. Diet food and anti-baldness balms are doing what they're supposed to be, not the opposite. Causation causes correlation, but not the necessarily the converse. There is lot more to proving causation than this simple correlation formula, and that's why you've got to be very careful reading news stories, or even medical journals that purport to show that A causes B.