Correlation pitfalls – Happier times with mutual information?

I’ve become increasingly anxious about properties of correlation I never knew existed. I collect resources and stuff on the topic in this post, so that have everything in one place. Some resources for beginners in the end of the post.

Correlation isn’t causation, and causation doesn’t require correlation. Ok. But have you heard that correlation is not correlation? In other words, things can be dependent without being correlated, and independent though correlated. Ain’t that fun. As Shay Allen Hill describes visually in his excellent, short blog (HIGHLY RECOMMENDED):

[C]ovariance doesn’t actually measure “Does y increase when x increases?” it only measures “Is y above average when x is above average (and by how much)?” And when covariance is broken [i.e. mean doesn’t coincide with median], our correlation function is broken.

So there may well be situations, where only 20% of people in the sample show dependence between two variables, and this shows up as a correlation of 37% at minimum. Or when a correlation of 0.5 carries ~4.5 times (and a correlation of 0.75 carries ~12.8 times) more information than a correlation of 0.25. As you may know, in psychology, it’s quite rare to see a correlation of 0.5. But even a correlation of 0.5 only gives 13% more information than random. This prompted the following conversation:

How can we interpret a result without in-depth knowledge of the field as well as the data in question? A partial remedy apparently is using mutual information instead (see this paper draft for more information). I know nothing about it, so like always, I just started playing around with things I don’t understand. Here’s what came out:

correlation-septet

The first four panels are the Anscombe’s Quartet. Fifth illustrates Taleb’s point about intelligence. Data for the last two panels are from this project. First four and last two panels have the same mean and standard deviation. Code for creating the pic is here.

MIC and BCMI were new to me, but I thought they were easy to implement, which doesn’t of course mean they make sense. But see how they catch the dinosaur?

  • MIC is the Maximal Information Coefficient, from maximal information-based nonparametric exploration (documentation)
  • BCMI stands for Jackknife Bias Corrected MI estimates (documentation)
  • DCOR is distance correlation (see comments)

I’d be happy to hear thoughts and caveats regarding the use of entropy-based dependency measures in general, and these in particular, from people who actually know these methods. Here’s a related Twitter thread, or just email me!


ps. If this is your first brush with uncertainties related to correlations, and/or have little or no statistics background, you may not know how correlation can vary spectacularly in small samples. Taleb’s stuff (mini-moocs [1, 2]) can sometimes be difficult to grasp without math background, so perhaps get started with this visualisation, or these Excel sheets. A while ago I animated some elementary simulations of p-value distributions for statistical significance of correlations; selective reporting makes things a lot worse than what’s depicted there. If you’re a psychology student, also be sure to check out the p-hacker app. If you haven’t thought about distributions much lately, check this out for a fun read by a math student.

⊂This post has been a formal sacrifice to Rexthor.⊃

3 thoughts on “Correlation pitfalls – Happier times with mutual information?

Leave a Reply to Vithor Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s