.

  • Thread starter Deleted member 139972
  • Start date

TuxDave

Lifer
Oct 8, 2002
10,572
3
71
Getting rid of the sum_of_samples is easy but if you can't even keep track of how many samples you already have... man. That's a toughie.
 

imported_Seer

Senior member
Jan 4, 2006
309
0
0
I can see how you can do it without keeping track of the sum (but knowing the #) quite easily, but not the other way around. Tell me if you want to know what I do know, I will work it out and post it.
 

silverpig

Lifer
Jul 29, 2001
27,709
11
81
Originally posted by: Deleted member 139972
I'm getting an infinite stream of integers for an algorithm and I want to create a running mean without having to keep track of the number of samples or a running sum..

IE: I don't want to have SUM_OF_SAMPLES / NUM_SAMPLES as the solution becuase I'm going to have infinitely many integers. (which will quickly exhaust the resources of my PC and any datatypes used to store the running sum and number of samples)

Is there another way to calculate a running mean for an infinitly long distribution??

You'll have to know how many. It doesn't matter if you keep summing them though. I think the following will work:

m = mean at next step
M = mean at last step
x = new number
n = running count of numbers

m = M - (M - x)/n

I just whipped that out of my head and didn't check my algebra, so it might not be right...
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
i cant think of any way to get the correct answer without having the number of samples. A high # point running average filter will of course behave as a low pass filter and get rid of most of the small fluctuations in the local sample, but it is nto averaging the whole series, jsut smoothing it out.

also, obviously nothing is infinite, so maybe you can just store the value anyways, remember, even 1 kilobyte of data can store a number up to 2^8192, thats a monstorously huge number that you will never exceed.
 

TuxDave

Lifer
Oct 8, 2002
10,572
3
71
Originally posted by: Deleted member 139972
I found the answer. It's exactly what TCP Tahoe does to estimate the Average RTT using a low pass filter and a Alpha value. For the low pass filter, I'm setting an alpha = 0.9 to ensure that large variances in the local sample do not cause major fluctuations in the result.

I tested this in excel and it fluctuates around the average and is very close, but not entirely. For such a cheap calculation, it's remarkably accurate. (Standard deviation is around 5% to 10%) Here is how it works in pseudo code:

int lastValue = GetSample();
int presValue = GetSample();
int filterAvg = presValue; // Seed the average, it will converge to the real average over time.

while(true) {

int localAvg = (presValue + lastValue) / 2;
filterAvg = (0.9 * filterAvg) + (1-0.9) * (localAvg);

lastValue = presValue;
presValue = GetSample();

}

Fixed.

Psh... I thought you wanted a methodology which gave you the accurate mean. This looks like it may work given that the sampled data is random and has an expected value. I don't think it'll work for other things. Can you try data that looks like 1,2,3,4,5,6,7..... and see how the running average and this formula compares. I'm pretty sure it'll diverge.
 

djhuber82

Member
May 22, 2004
51
0
0
A low-pass filter may work for your application, but it is NOT giving you a true mean, just a local mean. That said, I don't know why you would want a true mean, because after a sufficiently long time the mean will basically not respond at all to new input. With a LPF you can track the input.
 

TuxDave

Lifer
Oct 8, 2002
10,572
3
71
Originally posted by: Deleted member 139972
Originally posted by: TuxDave
Originally posted by: Deleted member 139972
I found the answer. It's exactly what TCP Tahoe does to estimate the Average RTT using a low pass filter and a Alpha value. For the low pass filter, I'm setting an alpha = 0.9 to ensure that large variances in the local sample do not cause major fluctuations in the result.

I tested this in excel and it fluctuates around the average and is very close, but not entirely. For such a cheap calculation, it's remarkably accurate. (Standard deviation is around 5% to 10%) Here is how it works in pseudo code:

int lastValue = GetSample();
int presValue = GetSample();
int filterAvg = presValue; // Seed the average, it will converge to the real average over time.

while(true) {

int localAvg = (presValue + lastValue) / 2;
filterAvg = (0.9 * filterAvg) + (1-0.9) * (localAvg);

lastValue = presValue;
presValue = GetSample();

}

Fixed.

Psh... I thought you wanted a methodology which gave you the accurate mean. This looks like it may work given that the sampled data is random and has an expected value. I don't think it'll work for other things. Can you try data that looks like 1,2,3,4,5,6,7..... and see how the running average and this formula compares. I'm pretty sure it'll diverge.

Yep, it doesn't converge very well at all. I did a test using 1,100 samples and got a actual mean of 553 and an estimated mean of 1095. Pretty large difference. But the nice thing is that my data is not going to be growing linearly, but hovering around a real mean. This should be a quick and easy way to get it I think. You agree?

Yeah. If you pick up any digital signal processing book, I'm pretty sure you can pick up a couple more filter equations. But I guess it's good enough.
 

Matthias99

Diamond Member
Oct 7, 2003
8,808
0
0
Originally posted by: Deleted member 139972
Yah, I plugged it into my application today are worked fantastically. It was very close most of the time (verified with a 3rd party app) and included only the most recent activity on the wire. Thanks for you help guys!

If you want to more explicitly calculate the 'recent' mean without storing an (arbitrarily large) running total, you could store the last 10/100/1000 (or whatever fixed number you want) samples and then calculate a true mean on those. This would be more accurate, although it would require more storage. What you have done will estimate the mean of the entire sequence so far, biased towards more recent samples. That will be pretty close to the true mean if the variance of the distribution is not too high.

Mathematically, you can't calculate the exact mean of the entire sequence without either storing a running total or tracking the number of items (both of which may get arbitrarily large). However, unless you are really getting into a LONG simulation (or the samples you get are very large, like 512-bit numbers), you could probably keep a running total without too much difficulty. If you process 1000 32-bit samples per second for a week straight, the sum will still fit in an unsigned 64-bit number with plenty of space left over.
 

CycloWizard

Lifer
Sep 10, 2001
12,348
1
81
Originally posted by: Matthias99
If you want to more explicitly calculate the 'recent' mean without storing an (arbitrarily large) running total, you could store the last 10/100/1000 (or whatever fixed number you want) samples and then calculate a true mean on those. This would be more accurate, although it would require more storage. What you have done will estimate the mean of the entire sequence so far, biased towards more recent samples. That will be pretty close to the true mean if the variance of the distribution is not too high.
This is about what I was going to suggest. A moving average will be more responsive to changes in the system while still allowing you to know exactly what the mean of the last n samples is without wasting too many resources. It's also a lagging indicator of the mean, so you need to be careful when interpreting it. However, if the mean doesn't change much, then the lag becomes unimportant.
 

djhuber82

Member
May 22, 2004
51
0
0
There's no reason an iterative filter shouldn't work just fine. I do not agree that a moving average (FIR filter) will be more accurate than an iterave one (IIR filter). You can design a good low pass filter with either one, but an IIR will use fastly fewer resources and introduce much less delay for the same performance. Introducing too much delay in the loop of a feedback system (and what you've described sounds like a feedback system) can lead to instability.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |