Correlation!!!

Before I introduce the game I’ll be focusing on, I need to first describe the concept of correlation. You probably already know what correlation means. But let’s review anyway!


Let’s say you and I both have a quarter, and we agree to toss these coins at the same time. Furthermore, we agree that if they both land the same (both heads, or both tails) I win the game, and if they land opposite (one heads, one tails), you win the game.

OK! We start playing. Will you or I win more games?

The answer, as long as there’s no funny business, is that on average I’ll win half the time and you’ll win half the time. Why? Because what your coin reads and what mine reads are independent of each other. They are uncorrelated. The correlation between which side the two coins land on is zero. In math, if we label my coin 0 and yours 1, we can call the correlation C₀₁ between coin 0 and coin 1 C₀₁ = 0. If mine is heads, and no magical force does anything to your coin, yours will be heads half the time and tails half the time. Uncorrelated. Zero correlation. Make sense?

Give it a try! Get a friend, split your poker chips with them, break out your Cannda and Untd States coins, and start tossing them! See how long it takes before one of you wins all the money!

Now let’s change the game. We’ll still have two coins, but we are savages and agree to glue them both to a stick so that they together have to land either both heads up, or both tails up. We now start throwing this coin-stick atrocity. Now who wins? Me! Every time we throw the thing both coins read the same, either both heads or both tails. I win every time! In this case, the two coins are perfectly correlated. They always agree. We can define perfect correlation between coin 0 and coin 1 as a correlation of C₀₁ = +1.

Throw this AI generated coiny-stick business and you either get both heads or both tails on every throw. The result of our coin toss is now perfectly correlated! This ridiculous thing produces a correlation between coin 0 and coin 1 of C₀₁ = +1.

Let’s change the game one more time. We take one of the two coins and flip it over on the stick so that they together have to land with one heads up and one tails up. We now start throwing them. Now who wins? You win every time! They are now always different. In this case, the two coins are perfectly anti-correlated. They can land one of two ways (heads/tails or tails/heads) but they always disagree. We can define perfect anti-correlation between coin 0 and coin 1 as a correlation of C₀₁ = -1.

Throw this coiny-stick business and you always get one heads and one tails. The result of our coin toss is now perfectly anti-correlated. This thing produces a correlation between coin 0 and coin 1 of C₀₁ = -1 (perfect anti-correlation).

Two binary variables (such as what the coins read after a coin toss) can in general have arbitrary correlation. Perfect anti-correlation, with C₀₁ = -1, is one limit. Perfect correlation C₀₁ = +1 is the other limit. Zero correlation C₀₁ = 0 is an important special case. In the general case -1 ≤ C₀₁ ≤ +1.

In the above cases it was kind of obvious by construction what the correlations were going to be. But what if we don’t know how correlated two things are? Easy! We generate a lot of examples, and from this we can extract a measurement of their correlation.

There are several different ways to do this. The one I will use is really simple.


Let’s stick to our quarters example, and let’s say we play 10 games. We write down the results of each game by creating a list of what we saw. If a coin reads heads, we write down -1, and if it reads tails we write down +1. I’m actually going to do this for real now… standby … OK. Here’s my list: [(1,1), (-1,1), (1,-1), (-1,1), (-1, 1), (1,1), (-1,-1), (-1,-1), (-1,-1), (1,1)].

Yeah, I actually flipped both of these coins 10 times and got the results in the above list.

Now to compute the correlation, we just multiply each pair of numbers together, add them all up, and divide by the total number of trials. In math, for N trials the correlation between the left coin, labeled x (with measured values xⱼ) and the right coin, labeled y (with measured values yⱼ) is

For my measured data that works out to C = 1/10 * (1-1-1-1-1+1+1+1+1+1) = 2/10 = 0.2 — a positive correlation! For a binary variable with potential values of +1 or -1, when we multiply them together we get +1 if they are the same, or -1 if they are different. If they are all the same C=1, if they are all different C=-1, and if there are the same number of each C=0.

Because I got C=0.2, that means I saw more heads/heads and tails/tails examples in the data than heads/tails and tails/heads, which is true! Of course if I kept doing this C would tend to zero as N gets bigger, because as far as I know I don’t have magic coins.

Ok that was a bit tedious! But because correlation is such a big part of understanding how to play the game I’m about to introduce I think it was worth going over.

Previous
Previous

The Ising Model, the Boltzmann Distribution, and What Sampling From a Probability Distribution Means

Next
Next

The Thread I’m About to Pull