Friday, August 31, 2012

Correlation......is it All?

A recent article by Prof. Krishnamurthy Subramanian of ISB ('Why research that establishes causality is better than just correlation?', The Economic Times, 10 August 2012) caught my attention.  This article talks about some issues in research that I have been discussing with some of my co-researchers during our week-end meetings.  Correlation is a statistical tool that is widely used in research in social sciences.  This measures the strength of the relationship between the movements of two variables.  But, unfortunately, it measures only that and nothing more.  Many times I have seen researchers trying to look at correlation from the point of view of causality.  But correlation only indicates a possible causality; but does not confirm the same.

Garry Koop ('Analysis of Financial Data', John Wiley & Sons, 2006) explains this with a beautiful example.  In a study conducted on a group of smokers, it was found that there was a very high positive correlation between smoking and the incidence of lung cancer.  Incidentally, a large proportion of these smokers also consumed alcohol and hence, smoking and consumption of alcohol were also highly correlated.  As a result, there was positive correlation between consumption of alcohol and incidence of lung cancer.  Now let us examine each of the above cases.  In case of correlation between smoking and lung cancer, there exists causality and it runs from smoking to cancer. That is, smoking causes lung cancer and not the other way round.  In the second case, the correlation between smoking and alcohol consumption, there is no causality as neither would lead to the other.  Here, the correlation arises from a social behaviour.  As far as the third case is concerned, the correlation (between consumption of alcohol and lung cancer) is just a coincidence as these two are neither related, nor exhibit any causality.  So, based on the third measure of correlation, if one were to conclude that alcohol consumption leads to lung cancer, it would be a Himalayan blunder!

Correlation, being a mathematical measure, can be computed across any variables!  But whether there exists a relationship between these variables, that merits inferences based on correlation, is something that the researcher has to decide based on his intellectual and intuitive capabilities.  Hence, while using correlation as a measure to draw inferences, the following issues become important:
a) Are the variables related to each other?
b) Is there causality between the variables?
c) If there is causality, in which direction does it flow?

As Prof. Subramanian argues, one should not be satisfied with correlation while drawing conclusions on causality.  Correlation can only be the first step towards establishing causality.  In order to confirm causality, one has to move beyond correlation, and use some advanced Econometric tools specially designed for this purpose.