Welcome to NRICH.

 
Operations on probability distributions


By Victoria Griffiths (T3139) on Thursday, August 31, 2000 - 08:58 am:

How do I add, subtract, multiply, divide two or more probability distributions? For example, if I measure the mass and the acceleration of an object several times and find appropriate distributions for them, how do I find the distribution of their product (ie Force?). Does the order of operating matter?


By Dave Sheridan (Dms22) on Thursday, August 31, 2000 - 10:48 am:

Hi Victoria,

I think there's a slight confusion about exactly what we're talking about here. We don't speak about operating on distributions, but rather we talk of the random variables themselves. I'll give you an example.

Suppose I roll two dice, one red and one blue. They have the same distribution of course. Let's look at the difference between the number on the red die and that of the blue one. I won't tell you what the distribution of this. But now look at the difference between the number on the red die and that of the red die. Same distributions, but suddenly the difference is always zero. It definitely wasn't before. So there's more information in probability than the distribution of an object.

What you do need to know is the joint distribution of two random variables. Let's say you're looking at X and Y. You may know P(X=x) and P(Y=y) for any x,y. But that's not enough to tell you about X+Y. You now need to know P(X=x,Y=y) ie what happens when they're together in the same expression. For the dice, you should be able to see that if X is red and Y is red then this is zero unless x=y, yet when X is red and Y is blue this is not always the case. That's the extra information we needed to distinguish the two cases.

Once you have a join probability distribution, working out functions of X and Y is simple. For example,
P(X+Y=z)=P(X=0,Y=z)+P(X=1,Y=z-1)+...+P(X=z,Y=0)
at least when X and Y are both nonnegative integers. Can you generalise this? Any idea how to work out P(X×Y=z) using a similar method?

Let me know if you've digested this and I'll tell you some more.

-Dave


By Victoria Griffiths (T3139) on Thursday, August 31, 2000 - 01:45 pm:

Dave

Thanks for the quick reply - I agree that there might be some confusion, so let's take another example. Suppose each person in a class cuts a rectangle out of card, to the "same" dimensions. Due to the variability of this process, there will be variations in the length of each side. We could assume that these differences are normally distributed and could represent them through the mean and standard distribution. I am interested in what the probability distribution of the area of the rectangles would be. I seem to recall that the mean of the distribution E(xy) is

u(xy)=u(x).u(y)

and the standard distribution s(xy) is

s(xy)=sqrt(s(x)2.s(y)2 + u(x)2.s(y)2 + u(y)2.s(x)2 )

Thus, we have analytical expressions for the mean and standard deviation of the joint distribution. However, what if the lengths have some other distribution (eg. log-normal or triangular)? Presumably we could use a more generalised distribution for the joint distribution, which has more parameters than the normal distribution (mean and std. dev.), and use other moments of the two input distributions to calculate these parameters.

Victoria


By Dave Sheridan (Dms22) on Thursday, August 31, 2000 - 02:54 pm:

Ok, there's two different issues here. Firstly, taking two random variables and combining them and secondly taking one random variable and finding the distribution of a function of it.

What you recall about the mean of XY is not true in general. For example, taking the two dice above you'll see that if X and Y are both the red die, then XY=X2 and the mean of X2 is definitely not (mean of X)2 - think of variance. There is no way to say what the mean of XY is in general, unless we know the joint distribution of X and Y.

I'll answer your question though. We have to state our assumptions since I may interpret your question differently to what you imagined it would mean. I'll assume that the students cut two lines and use the edge of the paper for the other two, so that opposite ends of the rectangles are the same length. Furthermore, I'll assume that the lengths are independent of each other. This is important. If you haven't heard of this concept, it's really useful in probability theory - it means that even if we know the exact length of one side, it doesn't give us any further information about how long the other side is. This is reasonable for our situation but not for others (for example, if you know the size of a person's left foot, this gives you a lot of information about how big the right foot will be. You won't know for sure, but you'll be more sure of its rough measurements than before you'd measured the left foot).

A consequence of being independent is that expectations factorise - that is, E(XY)=E(X)E(Y) and the same is true of any function of X and Y, so that E(f(X)g(Y))=E(f(X))E(g(Y)). The same is true of probability distributions, so we know the joint distribution function.
P(X<x,Y<y)=P(X<x)P(Y<y)
Now, we need to calculate the area. For a rectangle, this is simply XY. How do we specify the distribution? Well, one way is to ask what is P(XY<z) for any z.

How would you go about this? There's no real restriction on X. But if we know what X is and we want XY<z that means that Y<z/X. Since X is continuous, we must integrate over all possible values.

P(XY < z) = ó
õ
¥

0 
P(Y < z/x) p(x) dx

where p(x) is the density function for X. Please let me know if you don't understand why I've done this; otherwise I'll assume you can follow it.

You'll quickly see that this is not a nice distribution. There isn't a nice answer I'm afraid - even the normal distribution becomes messy if you do simple things to it. We can easily work out its mean and variance, but not much else. It's worse if X and Y are not independent.

Occasionally you'll find combinations which make sense, for example if X and Y are normal with mean 0 and variance 1, then X/Y has a Cauchy distribution - quite a horrible distribution itself, but at least something we know a lot about and can write down a density for. In general, the distributions of functions of random variables are very difficult to fathom though.

-Dave

By Victoria Griffiths (T3139) on Thursday, August 31, 2000 - 05:28 pm:

Dave

so far so good (I think) - sounds as if the problem is not that simple to solve ... but how about this for an approach (assuming that the variables are independent). ASSUME that the resulting distribution P(XY) is of a given analytical form, characterised by some number (N) of parameters (eg. triangular distribution N=3, normal distribution N=2) etc. For each of the distributions X and Y calculate the first N moments, then use these results to calculate the first N moments of the joint distribution. Finally, use some sort of regression to fit the parameters of the distribution P(XY) to these N moments.

Assuming that this is a feasible scheme, how do we combine the moments of the individual distributions to give the moments of the joint distribution? Presumably this is a standard statistical technique?

If this scheme does work, and we extend it to cuboids, does the order of combining the distributions matter? In other words is this true

P(XYZ) = P(P(XY).P(Z)) = P(P(X).P(YZ))

Best regards

Victoria

PS: Can you recommend an introductory text for this type of problem?


By Dave Sheridan (Dms22) on Friday, September 1, 2000 - 11:20 am:

Right, we are talking about statistics here then. Mostly, if you're doing something like this you'd assume that all parameters are jointly normal, which means they each have a normal distribution but their joint distribution is a generalisation of the normal too; they don't have to be independent but we can say a lot about them even if they're not. This distribution is characterised solely by its mean and covariance matrix (which is a generalisation of variance) as you might expect.

Yes, there is a standard technique of fitting parameters to particular distributions by estimating them from the sample you have obtained, although not necessarily in the way you may imagine. You might have heard that a good estimate of variance is not the variance of the sample, but n/(n-1) times the variance of the sample.

Generally in this situation you must make some initial assumptions on the data, but after that fitting does not get affected by order; more specifically, you fit everything at the same time.

Would you like a recommended text on statistical techniques (what you might assume and then fit data to) or on the underlying probability (how you'd go about working out the strange distributions involved, and generally how to think of this sort of thing)? For the former, pretty much anything with the word "Statistics" in the title will do; there's nothing which specifically concentrates on what we've been talking about, but most books cover it in some way. For a really good introuction to probability which also deals with some more advanced concepts (you can ignore these the first time round and then when you're more advanced yourself, you might find them very interesting), I'd recommend in particular "Probability and Random Processes" by Grimmett and Stirzaker, published by OUP.

-Dave