Can someone help explain what the formula
|
This does look a little complicated if you
don't know where the terms come from. I'll use Ex or E(x) to denote
the average of x (read, expectation of x). The first interesting
thing to think about if you have a set of data is what Ex is. This
allows us to compare data sets to see whether one is, on average,
bigger than the other. But this isn't particularly interesting
because I could just add 100 to each point and obtain a larger data
set which has the same properties as the original one.
Generally, we want to know "how far away" we are from the mean, or
at least on average. Quite how to interpret this statement is
difficult but there are various reasons why mean square is the most
important. By definition, the sum of the "errors" obtained by
saying "every point is actually the mean" is zero. However, if you
square each of these errors before summation you'll get something
non-negative and in all interesting situations (at least one point
is not the mean) it'll be strictly positive. This is called the
variance,
Var(X)=E((X-EX)2)=E(X2)-(EX)2,
where the first equality is a definition, variance being the
expected value of the square of the difference between a point and
its mean, and the second equality follows by expanding the
quadratic and cancelling.
Variance of a set of data points is a useful concept but sometimes
we have two separate sets which clearly should be regarded as
distinct. For example, in a scatter graph (like you mention - but
this explanation works in full generality and even on abstract
distributions if you define Ex properly). We need something called
covariance to deal with more than one set of data points.
Suppose I give you data on the number of umbrellas bought each day
and the number of sandwiches bought in a particular shop. They're
unlikely to be related. However, if I give you two data sets which
are identical, they are intimitely related to each other. This
leads to the concept of correlation, but we need an essential step
inbetween.
It should be reasonably intuitive that the generalisation of
E((X-EX)2) to two data sets should be E((X-EX)(Y-EY))
and again there are mathematical reasons why this is a particularly
good way to measure the error in assuming that the x points are
equal to their mean, at the same time
as the y points being equal to their mean. This concept is
called the covariance and a little rearranging gives
Cov(X,Y)=E((X-EX)(Y-EY))=E(XY)-(EX)(EY)
which should remind you of variance above. Indeed, we could say
Var(X)=Cov(X,X).
There's no generalisation to three quantities. Instead, we specify
a covaraince matrix, which tells us each pairwise set of
covariance.
The Cauchy Schwartz inequality tells us that
E((X-EX)(Y-EY)) <=
sqrt(E((X-EX)2)E((Y-EY)2))
ie Cov(X,Y) <= sqrt(Var(X)Var(Y))
so if we consider
r = Cov(X,Y)/sqrt(Var(X)Var(Y)),
this is a number between -1 and 1 which tells us how closely
related the distributions are. With a little more theory, it's
simple to prove that independent data sets (or distributions) have
zero correlation (although the converse is not in general true) and
somewhat easier, if X and Y are identical then their correlation is
1.
If you rewrite the above equation using the expansions I've given
earlier, you'll find it's identical to your expression but in a
much more easily understood form.
Hope that helps - I'm happy to explain anything in more depth if
you like.
-Dave