.

TuxDave · Apr 5, 2006

Getting rid of the sum_of_samples is easy but if you can't even keep track of how many samples you already have... man. That's a toughie.

Deleted member 139972 · Apr 6, 2006

imported_Seer · Apr 6, 2006

I can see how you can do it without keeping track of the sum (but knowing the #) quite easily, but not the other way around. Tell me if you want to know what I do know, I will work it out and post it.

Deleted member 139972 · Apr 6, 2006

silverpig · Apr 6, 2006

Originally posted by: Deleted member 139972
I'm getting an infinite stream of integers for an algorithm and I want to create a running mean without having to keep track of the number of samples or a running sum..

IE: I don't want to have SUM_OF_SAMPLES / NUM_SAMPLES as the solution becuase I'm going to have infinitely many integers. (which will quickly exhaust the resources of my PC and any datatypes used to store the running sum and number of samples)

Is there another way to calculate a running mean for an infinitly long distribution??

You'll have to know how many. It doesn't matter if you keep summing them though. I think the following will work:

m = mean at next step
M = mean at last step
x = new number
n = running count of numbers

m = M - (M - x)/n

I just whipped that out of my head and didn't check my algebra, so it might not be right...

BrownTown · Apr 6, 2006

i cant think of any way to get the correct answer without having the number of samples. A high # point running average filter will of course behave as a low pass filter and get rid of most of the small fluctuations in the local sample, but it is nto averaging the whole series, jsut smoothing it out.

also, obviously nothing is infinite, so maybe you can just store the value anyways, remember, even 1 kilobyte of data can store a number up to 2^8192, thats a monstorously huge number that you will never exceed.

TuxDave · Apr 6, 2006

Originally posted by: Deleted member 139972
I found the answer. It's exactly what TCP Tahoe does to estimate the Average RTT using a low pass filter and a Alpha value. For the low pass filter, I'm setting an alpha = 0.9 to ensure that large variances in the local sample do not cause major fluctuations in the result.

I tested this in excel and it fluctuates around the average and is very close, but not entirely. For such a cheap calculation, it's remarkably accurate. (Standard deviation is around 5% to 10%) Here is how it works in pseudo code:

int lastValue = GetSample();
int presValue = GetSample();
int filterAvg = presValue; // Seed the average, it will converge to the real average over time.

while(true) {

int localAvg = (presValue + lastValue) / 2;
filterAvg = (0.9 * filterAvg) + (1-0.9) * (localAvg);

lastValue = presValue;
presValue = GetSample();

}

Fixed.

Psh... I thought you wanted a methodology which gave you the accurate mean. This looks like it may work given that the sampled data is random and has an expected value. I don't think it'll work for other things. Can you try data that looks like 1,2,3,4,5,6,7..... and see how the running average and this formula compares. I'm pretty sure it'll diverge.

Deleted member 139972 · Apr 6, 2006

djhuber82 · Apr 6, 2006

A low-pass filter may work for your application, but it is NOT giving you a true mean, just a local mean. That said, I don't know why you would want a true mean, because after a sufficiently long time the mean will basically not respond at all to new input. With a LPF you can track the input.

TuxDave · Apr 6, 2006

Originally posted by: Deleted member 139972

Originally posted by: TuxDave

Originally posted by: Deleted member 139972
I found the answer. It's exactly what TCP Tahoe does to estimate the Average RTT using a low pass filter and a Alpha value. For the low pass filter, I'm setting an alpha = 0.9 to ensure that large variances in the local sample do not cause major fluctuations in the result.

I tested this in excel and it fluctuates around the average and is very close, but not entirely. For such a cheap calculation, it's remarkably accurate. (Standard deviation is around 5% to 10%) Here is how it works in pseudo code:

int lastValue = GetSample();
int presValue = GetSample();
int filterAvg = presValue; // Seed the average, it will converge to the real average over time.

while(true) {

int localAvg = (presValue + lastValue) / 2;
filterAvg = (0.9 * filterAvg) + (1-0.9) * (localAvg);

lastValue = presValue;
presValue = GetSample();

}

Click to expand...

Fixed.

Psh... I thought you wanted a methodology which gave you the accurate mean. This looks like it may work given that the sampled data is random and has an expected value. I don't think it'll work for other things. Can you try data that looks like 1,2,3,4,5,6,7..... and see how the running average and this formula compares. I'm pretty sure it'll diverge.

Click to expand...

Yep, it doesn't converge very well at all. I did a test using 1,100 samples and got a actual mean of 553 and an estimated mean of 1095. Pretty large difference. But the nice thing is that my data is not going to be growing linearly, but hovering around a real mean. This should be a quick and easy way to get it I think. You agree?

Yeah. If you pick up any digital signal processing book, I'm pretty sure you can pick up a couple more filter equations. But I guess it's good enough.

Deleted member 139972 · Apr 6, 2006

Matthias99 · Apr 6, 2006

Originally posted by: Deleted member 139972
Yah, I plugged it into my application today are worked fantastically. It was very close most of the time (verified with a 3rd party app) and included only the most recent activity on the wire. Thanks for you help guys!

If you want to more explicitly calculate the 'recent' mean without storing an (arbitrarily large) running total, you could store the last 10/100/1000 (or whatever fixed number you want) samples and then calculate a true mean on those. This would be more accurate, although it would require more storage. What you have done will estimate the mean of the entire sequence so far, biased towards more recent samples. That will be pretty close to the true mean if the variance of the distribution is not too high.

Mathematically, you can't calculate the exact mean of the entire sequence without either storing a running total or tracking the number of items (both of which may get arbitrarily large). However, unless you are really getting into a LONG simulation (or the samples you get are very large, like 512-bit numbers), you could probably keep a running total without too much difficulty. If you process 1000 32-bit samples per second for a week straight, the sum will still fit in an unsigned 64-bit number with plenty of space left over.

CycloWizard · Apr 7, 2006

Originally posted by: Matthias99
If you want to more explicitly calculate the 'recent' mean without storing an (arbitrarily large) running total, you could store the last 10/100/1000 (or whatever fixed number you want) samples and then calculate a true mean on those. This would be more accurate, although it would require more storage. What you have done will estimate the mean of the entire sequence so far, biased towards more recent samples. That will be pretty close to the true mean if the variance of the distribution is not too high.

This is about what I was going to suggest. A moving average will be more responsive to changes in the system while still allowing you to know exactly what the mean of the last n samples is without wasting too many resources. It's also a lagging indicator of the mean, so you need to be careful when interpreting it. However, if the mean doesn't change much, then the lag becomes unimportant.

djhuber82 · Apr 7, 2006

There's no reason an iterative filter shouldn't work just fine. I do not agree that a moving average (FIR filter) will be more accurate than an iterave one (IIR filter). You can design a good low pass filter with either one, but an IIR will use fastly fewer resources and introduce much less delay for the same performance. Introducing too much delay in the loop of a feedback system (and what you've described sounds like a feedback system) can lead to instability.

.

Deleted member 139972

TuxDave

Lifer

Deleted member 139972

imported_Seer

Senior member

Deleted member 139972

silverpig

Lifer

BrownTown

Diamond Member

TuxDave

Lifer

Deleted member 139972

djhuber82

Member

TuxDave

Lifer

Deleted member 139972

Matthias99

Diamond Member

CycloWizard

Lifer

djhuber82

Member

TRENDING THREADS