# correlation does not imply causation

## Information about correlation does not imply causation

Correlation does not imply causation is a phrase used in the sciences and statistics to emphasize that correlation between two variables does not imply there is a cause-and-effect relationship between the two. Its converse, correlation proves causation, is a logical fallacy by which two events that occur together are claimed to have a cause-and-effect relationship. It is also known as cum hoc ergo propter hoc (Latin for "with this, therefore because of this") and false cause. It is subtly different to the fallacy post hoc ergo propter hoc, which in requiring a chronological component may be considered a subtype of cum hoc.

## Usage

In the strictest sense, it is always correct to say "Correlation does not imply causation". With casual use of the word "imply" the idea of a causal connection is in some sense true, but that is because the word "implies" can loosely mean suggests rather than requires. And correlation is certainly needed for causation to be proved. However, in logic, the technical use of the word "implies" means

* to be a sufficient circumstance.

This is the meaning intended by statisticians when they say causation is not certain. Indeed, p implies q has the technical meaning of logical implication: if p then q symbolized as p ⇒ q. That is "if circumstance p is true, then q necessarily follows."

In contrast, the everyday English meaning of "imply" is

* To indicate or suggest.

To say a "Correlation does not suggest causation" is false: A demonstrably consistent correlation often suggests some causal relationship (or implies it, in the casual sense of the word).

What the correlation does not do is prove causation, as arguments that use the cum hoc ergo propter hoc logical fallacy as a pattern of reasoning assert. [1]

Edward Tufte, in a criticism of the brevity of Microsoft PowerPoint presentations, deprecates the use of "is" to relate correlation and causation (as in "Correlation is not causation"), citing its inaccuracy as incomplete.[2] While it is not the case that correlation is causation, simply stating their nonequivalence omits information about their relationship. Tufte suggests that the shortest true statement that can be made about causality and correlation must be at least expanded to either
Empirically observed covariation is a necessary but not sufficient condition for causality.
or
Correlation is not causation but it sure is a hint.

## General pattern

The cum hoc ergo propter hoc logical fallacy can be expressed as follows:
• A occurs in correlation with B.
• Therefore, A causes B.
In this type of logical fallacy, one makes a premature conclusion about causality after observing only a correlation between two or more factors. Generally, if one factor (A) is observed to only be correlated with another factor (B), it is sometimes taken for granted that A is causing B even when no evidence supports this. This is a logical fallacy because there are at least four other possibilities:
1. B may be the cause of A, or
2. some unknown third factor is actually the cause of the relationship between A and B, or
3. the "relationship" is so complex it can be labelled coincidental (i.e., two events occurring at the same time that have no simple relationship to each other besides the fact that they are occurring at the same time).
4. B may be the cause of A at the same time as A is the cause of B (contradicting that the only relationship between A and B is that A causes B). This describes a self-reinforcing system.

In other words, there can be no conclusion made regarding the existence or the direction of a cause and effect relationship only from the fact that A is correlated with B. Determining whether there is an actual cause and effect relationship requires further investigation, even when the relationship between A and B is statistically significant, a large effect size is observed, or a large part of the variance is explained.

## Examples

Sleeping with one's shoes on is strongly correlated with waking up with a headache.
Therefore, sleeping with one's shoes on causes headache.

The above example commits the correlation-implies-causation fallacy, as it prematurely concludes that sleeping with one's shoes on causes headache. A more plausible explanation is that both are caused by a third factor, in this case alcohol intoxication, which thereby gives rise to a correlation. Thus, this is a case of possibility (2) above.

Ice cream sales correlate with the number of people who drown at sea.
Therefore, ice cream causes people to drown.

This fallacy concludes that as the number of ice creams sold increases at the same time that a higher number of people drown, there is a causal relationship. In fact, both are caused by a common third factor: Summer.

A recent scientific example:
Young children who sleep with the light on are much more likely to develop myopia in later life.

This result of a study at University of Pennsylvania Medical Center was published in the May 13, 1999 issue of Nature and received much coverage at the time in the popular press [3]. However a later study at Ohio State University did not find any link between infants sleeping with the light on and developing myopia but did find a strong link between parental myopia and the development of child myopia and also noted that myopic parents were more likely to leave a light on in their children's bedroom [4]. This is a case of (2).

Another example:
Since the 1950s, both the atmospheric CO2 level and crime levels have increased sharply.
Hence, atmospheric CO2 causes crime.

The above example arguably makes the mistake of prematurely concluding a causal relationship where the relationship between the variables, if any, is so complex it may be labelled coincidental. The two events have no simple relationship to each other beside the fact that they are occurring at the same time. This is a case of possibility (3) above; another such example is the hoax Mierscheid Law.

A more complex example:
Scientific research finds that people who use cannabis (A) have a higher prevalence of psychiatric disorders compared to those who do not (B).

This particular correlation is sometimes used to support the theory that the use of cannabis causes a psychiatric disorder (A is the cause of B). Although this may be possible, we cannot automatically discern a cause and effect relationship from research that has only determined people who use cannabis are more likely to develop a psychiatric disorder. From the same research, it can also be the case that (1.) having the predisposition for a psychiatric disorder causes these individuals to use cannabis (B causes A), OR (2.) it may be the case that in the above study some unknown third factor (e.g., poverty) is the actual cause for there being found a higher number of people (compared to the general public) who both use cannabis and who have been diagnosed as having a psychiatric disorder. Alternatively, it may be that the effects of cannabis are found more pleasureable by persons with certain psychiatric disorders. To assume that A causes B is tempting, but further scientific investigation of the type that can isolate extraneous variables is needed when research has only determined a statistical correlation.

Examples are abundant in political debate surrounding legal issues. For example, there is a correlation between the use of pornography and sex crimes. Individuals who frequently view pornography are more likely to commit sexual offences than those that do not view pornography. Some people point to this as evidence that pornography causes individuals to commit sex crimes, and hence they argue that pornography should be made illegal. Although such arguments are based on a logical fallacy, they can be politically compelling, particularly in highly emotional situations. For example, the correlation between possession of child pornography and paedophilia may be seen as a legitimate rationale for the banning of child pornography. In such a case, it may be deemed appropriate to err on the side of caution. If there is even a chance that child pornography leads to paedophilia, then it may be in the social interest to make its possession illegal.

Pastafarianism, a parody religion founded in 2005, satirically states that there is a correlation between the number of pirates and many natural disasters. Bobby Henderson, the creator of this religion, put forth the argument that:
Global warming, earthquakes, hurricanes, and other natural disasters are a direct effect of the shrinking numbers of pirates since the 1800s.[5]
This helps to show that things with statistically significant correlations are not necessarily related, and parodies the prevalence of logical fallacies in many religions.

## Determining causation

David Hume argued that causality cannot be perceived (and therefore cannot be known or proven), and instead we can only perceive correlation. However, he argued that we can use the scientific method to rule out false causes. [6]

Intuitively, causation seems to require not just a correlation, but a counterfactual dependence. Suppose that a student performed poorly on a test and guesses that the cause was not studying. To prove this, we think of the counterfactual - the same student writing the same test under the same circumstances but having studied the night before. If we could rewind history, and change only one small thing (making the student study for the exam), then causation could be observed (by comparing version 1 to version 2). Because we cannot rewind history and replay events after making small controlled changes, causation can only be inferred, never exactly known. This is referred to as the Fundamental Problem of Causal Inference - it is impossible to directly observe causal effects.[7]

A major goal of scientific experiments and statistical methods is to approximate as best as possible the counterfactual state of the world.[8] For example, one could run an experiment on identical twins who were known to consistently get the same grades on their tests. One twin is sent to study for six hours while the other is sent to the amusement park. If their test scores suddenly diverged by a large degree, this would be strong evidence that studying (or going to the amusement park) had a causal effect on test scores. In this case, correlation between studying and test scores would almost certainly imply causation.

Well designed statistical studies replace equality of individuals as in the previous example by equality of groups. This is achieved by randomization of the subjects to two or more groups. Although not a perfect system, placing the subjects randomly in the treatment/placebo groups ensures that it is highly likely that the groups are reasonably equal in all relevant aspects. If the treatment has a significantly different effect than the placebo, one can conclude that the treatment is likely to have a causal effect on the disease. This likeliness can be quantified in statistical terms by the P-value.

## References and notes

1. ^ Karl L. Wuensch, Department of Psychology, East Carolina University When does correlation imply causation?
2. ^ Tufte, Edward R. (2006). The Cognitive Style of PowerPoint: Pitching Out Corrupts Within. Cheshire, Connecticut: Graphics Press, 5. ISBN 0-9613921-5-0.
3. ^ CNN, May 13, 1999. Night-light may lead to nearsightedness.
4. ^ Ohio State University Research News, March 9, 2000. Night lights don't lead to nearsightedness, study suggests.
5. ^ Henderson, Bobby (2005). Church of the Flying Spaghetti Monster (HTML). Retrieved on 2006-06-11.
6. ^ [1]
7. ^ Paul W. Holland. 1986. "Statistics and Causal Inference" Journal of the American Statistical Association, Vol. 81, No. 396. (Dec., 1986), pp. 945-960.
8. ^ Judea Pearl. 2000. Causality: Models, Reasoning, and Inference, Cambridge University Press.

Science (from the Latin scientia, 'knowledge'), in the broadest sense, refers to any systematic knowledge or practice.[1] Examples of the broader use included political science and computer science, which are not incorrectly named, but rather named according to
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities.
correlation, also called correlation coefficient, indicates the strength and direction of a linear relationship between two random variables. In general statistical usage, correlation or co-relation refers to the departure of two variables from independence.
Causality or causation denotes the relationship between one event (called cause) and another event (called effect) which is the consequence (result) of the first. [1]
A fallacy is a component of an argument that is demonstrably flawed in its logic or form, thus rendering the argument invalid in whole. In logical arguments, fallacies are either formal or informal.
Post hoc ergo propter hoc, Latin for "after this, therefore because of this", is a logical fallacy (of the questionable cause variety) which assumes or asserts that if one event happens after another, then the first must be the cause of the second.
Logic (from Classical Greek λόγος logos; meaning word, thought, idea, argument, account, reason, or principle) is the study of the principles and criteria of valid inference and demonstration.
In logic and mathematics, logical implication is a logical relation that holds between a set T of formulae and a formula B when every model (or interpretation or valuation) of T is also a model of B.
Edward Rolf Tufte
Born: 1942
Kansas City, Missouri
Occupation: professor, statistician
Nationality: American

Edward Rolf Tufte (IPA /ˈtʌf.ti/) (born 1942 in Kansas City, Missouri, to Virginia and Edward E.
Microsoft PowerPoint is a presentation program developed by Microsoft for its Microsoft Office system. Microsoft PowerPoint runs on Microsoft Windows and the Mac OS computer operating systems.
Causality or causation denotes the relationship between one event (called cause) and another event (called effect) which is the consequence (result) of the first. [1]
correlation, also called correlation coefficient, indicates the strength and direction of a linear relationship between two random variables. In general statistical usage, correlation or co-relation refers to the departure of two variables from independence.
Coincidence is the noteworthy alignment of two or more events or circumstances without obvious causal connection. The word is derived from the Latin co- ("in", "with", "together") and incidere ("to fall on").

In statistics, a result is called significant if it is unlikely to have occurred by chance. "A statistically significant difference" simply means there is statistical evidence that there is a difference; it does not mean the difference is necessarily large, important or significant
In statistics, effect size is a measure of the strength of the relationship between two variables. In scientific experiments, it is often useful to know not only whether an experiment has a statistically significant effect, but also the size of any observed effects.
In statistics, the coefficient of determination R2 is the proportion of variability in a data set that is accounted for by a statistical model. In this definition, the term "variability" is defined as the sum of squares.
MeSH D009216

Myopia (from Greek: μυωπία myopia "near-sightedness"[1]), also called near- or short-sightedness
University of Pennsylvania (also known as Penn[3][4]) is a private, coeducational research university located in Philadelphia, Pennsylvania. According to the university, it is America's first university[5] and is the fourth-oldest
Medical Center may refer to:
• Medical Center (TV series), a drama that aired from 1969 to 1976
• Medical Center (Washington Metro), a Metro station
• Medical Center (CTA), a station on the Chicago Transit Authority's Blue Line now known as Illinois Medical District

May 13 is the 1st day of the year (2nd in leap years) in the Gregorian calendar. There are 0 days remaining.

## Events

• 1497 - Pope Alexander VI excommunicates Girolamo Savonarola.

20th century - 21st century
1960s  1970s  1980s  - 1990s -  2000s  2010s  2020s
1996 1997 1998 - 1999 - 2000 2001 2002

Year 1999 (MCMXCIX
Nature, in the broadest sense, is equivalent to the natural world, physical universe, material world or material universe. "Nature" refers to the phenomena of the physical world, and also to life in general.
The Ohio State University (OSU) is a coeducational public research university in the state of Ohio. The university was founded in 1870 as a land-grant university and is currently the largest university in the United States.
Editing of this page by unregistered or newly registered users is currently disabled due to vandalism.
If you are prevented from editing this page, and you wish to make a change, please discuss changes on the talk page, request unprotection, log in, or .
The Mierscheid-Law is an empirical law, published in the July 14 1983 issue of the German VorwÃ¤rts magazine, attributed to fictitious politician Jakob Maria Mierscheid, which predicts the vote of the Social Democratic Party of Germany (SPD) based on the size of crude steel
Flying Spaghetti Monster (also known as the Spaghedeity) is the deity of a parody religion called The Church of the Flying Spaghetti Monster[1] and its system of beliefs, "Pastafarianism".
A parody religion or mock religion is either a parody of a religion, sect or cult, or a relatively unserious religion that many people may take as being too esoteric to be classified as a "real" religion.
Piracy is a robbery committed at sea, or sometimes on the shore, by an agent without a commission from a sovereign nation. Seaborne piracy against transport vessels remains a significant issue (with estimated worldwide losses of US \$13 to \$16 billion per year [1] ),
A natural disaster is the consequence of a natural hazard (e.g. volcanic eruption, earthquake, landslide) which moves from potential in to an active phase, and as a result affects human activities.