FaD: Suggestion for Stats

mondobyte · Apr 19, 2005

Hi,

Goeff, I would be interested to know how long until you will be able to do real time stats upates (every 6 hours).

I'd like to see a valid node count. This is the number of simultaneous instances of FaD running and may be loosely equated as the number of CPU equivalents. Of course, P4 HT CPU's may show as 2 nodes which is entirely valid.

I've determined an algorithm that should yield a substantially more correct number of CPU's per member number:

Let us define some variables:

Job Start Timestamp = The timestamp for each Job start
Job End Timestamp = The timestamp for each Job end
Member = member number for the Job

#Interval Start = Timestamp of Midnight UTC
#Interval End = Timestamp Midnight UTC + N min.
#member = member number being analyzed

Select count(*) from [dbname] where [dbname]. = #member
and ( [dbname].JobStart < #InvervalEnd )
and ( [dbname].JobEnd > #IntervalStart)

This should yield the number of simultaneous jobs being processed around midnight which should roughly yield the number of "nodes" for any member. If count < 1 then set count = 1. My initial thought is that the interval, N, should be 5 minutes since QUEUE could delay job-to-job transitions by as much as 4 minutes. The potential error here is that if the interval, N, is too small, then any CPU's that complete a job very near our Interval Start may fail to be counted because that CPU may be between jobs during the entire duration of Interval and therefore this would understate the "node" count. The other potential error is that if the interval, N. is too large, then we may select jobs that start after IntervalStart and complete before IntervalEnd which would cause the "node" count to be overstated. In recent history, there have been very few queries that routinely process in 5 minutes or less. In the entire history of the project since Nov '02, there have been several queries that have processed in very short times (some in <20 seconds/job). If such future queries occur, then the calculation would probably be totally borked. Thankfully, such queries have lasted less than a week and in some cases, less than a day.

Now ... adjust the period to reflect a date in the past when most or all of the current queries are completed (IntervalEnd Timestamp = current timestamp - maximuum duration). For example, for a specific node, use the maxium duration for the last week or two to determine when the interval end timestamp should be for the most accurate assessment. Using this approacth the number of nodes will always lag reality but should prove much more accurate than what you currently use.

Steven

GeoffS · Apr 19, 2005

Hey Steven... I got your PM as I was leaving work. I haven't taken a close look at your algorithm yet, but the first thing on my plate for the stats is the multiple updates during the day. They're happening now for the nodes, just not the user/team totals. I'm also extremely busy at the moment and am not sure when I'll be able to devote some serious time toward the udpates. In the meantime, I'm doing what I usually do before a flurry of updates... collecting information and mulling over options... the one thing I'm having a bit of a problem with is the conversion from EST to BST since they seem to use daylight savings time there, but I don't think that will be much of a problem as I'll just check the file timestamp to see if it's changed... and if it's a new day, it's time to roll the stats...

Anyway, stats updates haven't come to a halt, but it may be a bit of time before you see the next ones... I'm all ears for everyone's suggestions in the meantime, and all suggestions are welcome!

Geoff

mondobyte · Apr 19, 2005

WoO has updated his stats to reflect my new CPU Count algorithm and I believe it be a more realistic value compared to the count of actual physical computers running FaD for a member id irrespecctive of how much time they run per day.

I do believe that his prior (replaced) algorithm was more indicative the the true equivalent number of CPU's required to attain the points production over the period. I.e., my points production is equivalent to 50+ computers running continuosly as opposed to actually having 110 computers -- some running part time ... and removing the time between jobs as non-productive ... and depreciating the production per hour based on the % of cpu utilization that FaD can use (not what is available) ...

Thanks for listening Goeff

Steven

GeoffS · Apr 19, 2005

As always... the pleasure is mine! I'll get on these changes just as soon as I can... heading to Montreal for the weekend to visit with a friend who was just told he has 4-8 wks to live (bone cancer), so I don't think this weekend will be terribly productive stats-wise

Geoff

Wolfsraider · Apr 20, 2005

Thanks Geoffs, and wishing the best for your friend.

Mike

Slatzman · Apr 20, 2005

Originally posted by: Wolfsraider

Thanks Geoffs, and wishing the best for your friend.

Mike

Best to your friend Geoff...do we need to create a FAD account for him...I would throw a cpu on it?

Adam

WizzardOfOzz · Apr 20, 2005

The algorithm posted above looked very good, the theory was sound, the numbers were good, then the next update happened and it went downhill, but it does offer a new direction, if I can figure out a new way of working with the information in a similar fashion (concept) then I'll post the SQL to produce it here for you.

mondobyte · Apr 21, 2005

Thanks WoO for attempting to Vette the concept ... seems the data provided by FaD is insufficient for the purpose.

Steven

FaD: Suggestion for Stats

mondobyte

Senior member

GeoffS

Lifer

mondobyte

Senior member

GeoffS

Lifer

Wolfsraider

Diamond Member

Slatzman

Golden Member

WizzardOfOzz

Junior Member

mondobyte

Senior member

TRENDING THREADS