D
Originally posted by: Deleted member 139972
I'm getting an infinite stream of integers for an algorithm and I want to create a running mean without having to keep track of the number of samples or a running sum..
IE: I don't want to have SUM_OF_SAMPLES / NUM_SAMPLES as the solution becuase I'm going to have infinitely many integers. (which will quickly exhaust the resources of my PC and any datatypes used to store the running sum and number of samples)
Is there another way to calculate a running mean for an infinitly long distribution??
Originally posted by: Deleted member 139972
I found the answer. It's exactly what TCP Tahoe does to estimate the Average RTT using a low pass filter and a Alpha value. For the low pass filter, I'm setting an alpha = 0.9 to ensure that large variances in the local sample do not cause major fluctuations in the result.
I tested this in excel and it fluctuates around the average and is very close, but not entirely. For such a cheap calculation, it's remarkably accurate. (Standard deviation is around 5% to 10%) Here is how it works in pseudo code:
int lastValue = GetSample();
int presValue = GetSample();
int filterAvg = presValue; // Seed the average, it will converge to the real average over time.
while(true) {
int localAvg = (presValue + lastValue) / 2;
filterAvg = (0.9 * filterAvg) + (1-0.9) * (localAvg);
lastValue = presValue;
presValue = GetSample();
}
Originally posted by: Deleted member 139972
Originally posted by: TuxDave
Originally posted by: Deleted member 139972
I found the answer. It's exactly what TCP Tahoe does to estimate the Average RTT using a low pass filter and a Alpha value. For the low pass filter, I'm setting an alpha = 0.9 to ensure that large variances in the local sample do not cause major fluctuations in the result.
I tested this in excel and it fluctuates around the average and is very close, but not entirely. For such a cheap calculation, it's remarkably accurate. (Standard deviation is around 5% to 10%) Here is how it works in pseudo code:
int lastValue = GetSample();
int presValue = GetSample();
int filterAvg = presValue; // Seed the average, it will converge to the real average over time.
while(true) {
int localAvg = (presValue + lastValue) / 2;
filterAvg = (0.9 * filterAvg) + (1-0.9) * (localAvg);
lastValue = presValue;
presValue = GetSample();
}
Fixed.
Psh... I thought you wanted a methodology which gave you the accurate mean. This looks like it may work given that the sampled data is random and has an expected value. I don't think it'll work for other things. Can you try data that looks like 1,2,3,4,5,6,7..... and see how the running average and this formula compares. I'm pretty sure it'll diverge.
Yep, it doesn't converge very well at all. I did a test using 1,100 samples and got a actual mean of 553 and an estimated mean of 1095. Pretty large difference. But the nice thing is that my data is not going to be growing linearly, but hovering around a real mean. This should be a quick and easy way to get it I think. You agree?
Originally posted by: Deleted member 139972
Yah, I plugged it into my application today are worked fantastically. It was very close most of the time (verified with a 3rd party app) and included only the most recent activity on the wire. Thanks for you help guys!
This is about what I was going to suggest. A moving average will be more responsive to changes in the system while still allowing you to know exactly what the mean of the last n samples is without wasting too many resources. It's also a lagging indicator of the mean, so you need to be careful when interpreting it. However, if the mean doesn't change much, then the lag becomes unimportant.Originally posted by: Matthias99
If you want to more explicitly calculate the 'recent' mean without storing an (arbitrarily large) running total, you could store the last 10/100/1000 (or whatever fixed number you want) samples and then calculate a true mean on those. This would be more accurate, although it would require more storage. What you have done will estimate the mean of the entire sequence so far, biased towards more recent samples. That will be pretty close to the true mean if the variance of the distribution is not too high.