F@H January TeAm record run?

StefanR5R · Jan 23, 2025

It's a huge downside of F@H that you can't buffer work.

In BOINC, I usually maintain a ≈0.6 days deep work buffer. Because during workdays, this is the response time in which I can reset my cable modem whenever it does not re-establish the Internet link by itself after whatever occasional interruption.

Just now I had to give the modem such a kick, as the connection gave out about eight hours ago. Also had to nudge the Folding slots to restart hanging uploads and downloads. The modem ran well during the PrimeGrid PPSE challenge luckily, in which I was lazy and set only a way smaller BOINC work buffer for these small tasks. I think it's a couple of weeks ago that I last had to reset the modem.

But for the upcoming F@H contest, this means that I will likely power-limit my Folding computers such that I don't need to keep windows open. Because I don't like it if the windows are open with kind-of-Winter weather outside while the computers go idling for almost half a day because the garbage cable modem didn't reconnect automatically. Alas, weather forecast predicts several degrees higher external temperature during next week, which will much reduce the heating demand of my apartment compared to the nicely chilly weather during the PrimeGrid challenge.

In2Photos · Jan 23, 2025

StefanR5R said:
It's a huge downside of F@H that you can't buffer work.

In BOINC, I usually maintain a ≈0.6 days deep work buffer. Because during workdays, this is the response time in which I can reset my cable modem whenever it does not re-establish the Internet link by itself after whatever occasional interruption.

Just now I had to give the modem such a kick, as the connection gave out about eight hours ago. Also had to nudge the Folding slots to restart hanging uploads and downloads. The modem ran well during the PrimeGrid PPSE challenge luckily, in which I was lazy and set only a way smaller BOINC work buffer for these small tasks. I think it's a couple of weeks ago that I last had to reset the modem.

But for the upcoming F@H contest, this means that I will likely power-limit my Folding computers such that I don't need to keep windows open. Because I don't like it if the windows are open with kind-of-Winter weather outside while the computers go idling for almost half a day because the garbage cable modem didn't reconnect automatically. Alas, weather forecast predicts several degrees higher external temperature during next week, which will much reduce the heating demand of my apartment compared to the nicely chilly weather during the PrimeGrid challenge.

What about getting a smart switch that would cycle the power on your modem at a certain time everyday?

cellarnoise · Jan 24, 2025

StefanR5R said:
It's a huge downside of F@H that you can't buffer work.

In BOINC, I usually maintain a ≈0.6 days deep work buffer. Because during workdays, this is the response time in which I can reset my cable modem whenever it does not re-establish the Internet link by itself after whatever occasional interruption.

Just now I had to give the modem such a kick, as the connection gave out about eight hours ago. Also had to nudge the Folding slots to restart hanging uploads and downloads. The modem ran well during the PrimeGrid PPSE challenge luckily, in which I was lazy and set only a way smaller BOINC work buffer for these small tasks. I think it's a couple of weeks ago that I last had to reset the modem.

But for the upcoming F@H contest, this means that I will likely power-limit my Folding computers such that I don't need to keep windows open. Because I don't like it if the windows are open with kind-of-Winter weather outside while the computers go idling for almost half a day because the garbage cable modem didn't reconnect automatically. Alas, weather forecast predicts several degrees higher external temperature during next week, which will much reduce the heating demand of my apartment compared to the nicely chilly weather during the PrimeGrid challenge.

I have what I think is a good connection to F@H and the internet. I had F@H crash recently on a few systems at the same time.

Maybe F@H told us to F@O for a bit? Sorry... but might be true as I have told F@H to FO a few times over the past few years, just with verbal communcation, but now with the interweb feedback links anymore? A.I. and A the FO I anymore? Who knows?

Just payback?

Sorry Puters ?

StefanR5R · Jan 24, 2025

Re workqueue depth:
In Folding@Home, whose client doesn't buffer (it loads new work when presently running work has reached a certain high completion percentage), my recent average task run times were 1.3 hours, according to my stats page at folding.extremeoverclocking.com.

In2Photos said:
What about getting a smart switch that would cycle the power on your modem at a certain time everyday?

Hmm, if I go for a "smart" switch (or: a USB relay), then I might consequently toggle it on demand rather than periodically.

On a side note, the modem's administrative HTTP interface is unfortunately a hot javascript mess, making it too hard (impossible for me at least) to reboot the modem by an HTTP request.

StefanR5R · Jan 24, 2025

Wow, I am sloppy. And have been absent from F@H before January for a long time.

At the end of December, I noticed that the latest computer which I equipped with a dGPU — which was I don't know how long ago — did not even have fahclient installed yet. However, this was swiftly corrected as I copied the config from another computer which did have F@H.

And just now, I noticed that my second latest computer with dGPU did not have fahclient on it either. It got dropped when I did a system version upgrade, which also was long ago enough that I don't recall how long ago I did it. And what's more, I noticed that this computer did not boot properly because its battery cell on the mainboard was empty. Luckily, I had one fresh CR2032 left. I shall go buy a few of those tomorrow as a reserve for future incidents of this kind. And I was able to copy all my BIOS preferences from another computer with same hardware via a USB stick. Saved me from going through all the BIOS screens to check everything what I had tweaked there ages ago and have long forgotten.

IEC · Jan 25, 2025

I've temporarily paused Windows updates through February on all my rigs for the duration of the race.

I should be adding an Intel Arc B580 tomorrow (and maybe stopping CPU folding on the Epyc, as the coil whine on the ASRock board is terrible).

StefanR5R · Jan 26, 2025

01.26.25, 12am CST stats: 400,924,921,734 points total; 19,613,222,160 points in January; 788,300,787 points 24hr average

Six days left until 02.02.25 12am CST. If we keep going at the exact same rate, that will be 24.3 Billion points for January 2025, a little short of the 26.9 Billion of January 2024. To beat the latter, we would have to turn the dials to >1,214,000,000 PPD.

StefanR5R · Jan 26, 2025

I am trying out CPU-folding on EPYC 9554P (Zen 4 "Genoa" 64c/128t, cTDP = 400 W).
It is running "64bit-avx2-256/a8-0.0.12/Core_a8.fah/FahCore_a8", that is, Gromacs core 0xa8 version 0.0.12 from Jan 16 2021. 64 threads/slot is the maximum supported by FAHCore 0xa8.

one 64t slot, project 12465:
estimated ≈1.3 M PPD for ≈280 W at the wall,
CPU runs at or slightly above top allcore boost of 3.75 GHz, CPU load is ≈6400%

two 64t slots, projects 12465 and 18496:
estimated ≈0.9 + ≈0.7 = 1.6 M PPD for ≈330 W at the wall,
CPU runs at allcore boost of 3.75 GHz, CPU load is ≈6400% + ≈6400%

three 42t slots, projects 12421, 12421, 12422:
estimated 3× ≈0.8+ = 2.5 M PPD for ≈390 W at the wall,
CPU runs at allcore boost of 3.75 GHz, CPU load is 3× ≈4200%

Tomorrow I'll try four 32t slots.

The CPU projects 12465 and 18496 took only a few hours @64t. The projects 12421 and 12422 take half a day @42t.

IEC · Jan 26, 2025

Just added an Intel ARC B580 to the mix.

Epyc 4564P (16c/32t 7950X equivalent) + Intel ARC B580 12GB GPU. It ain't much compared to my 4080 Super, but it's something

StefanR5R · Jan 27, 2025

StefanR5R said:
Tomorrow I'll try four 32t slots.

four 32t slots, projects 12464, 12465, 18495, 18497:
estimated 0.45…0.65 M each = 2.3 M PPD for ≈385 W at the wall,
CPU runs at allcore boost of 3.75 GHz, CPU load is 4× ≈3200%

after setting CPU affinities, each task bound to 2 exclusive CCXs:
estimated 0.5…0.8 M each = 2.6 M PPD for ≈390 W at the wall,
CPU runs at 3.7 GHz on average, CPU load is 4× ≈3200%

Conclusions:
– The change from 3×42 without affinity to 4×32 without affinity resulted in somewhat lower total PPD, certainly because of F@H's annoying overemphasis on the quick return bonus.
– Setting affinity makes the CPU caches work smarter and the CPU cores work harder, which improves PPD even though core clocks went down slightly. I don't know if this effect is from better cache usage (only one task's data in each L3 cache, not of up to four tasks), or from lesser cross-CCX data sharing, or both.

An older core (was it A7 or A6?) used MPI for multithreading (instead of OpenMP, as the A8 core does) and had no difficulties to scale to 128 threads if not more, and did scale very well on dual-socket machines too. MPI is designed to minimize data sharing between worker threads and is even used in Infiniband and Ethernet clusters in high performance computing.

PS,
I am guessing that eight ≤16t slots with CPU affinities set to 1 task : 1 CCX would result in higher computing throughput, but the degradation of the quick return bonus would more than counteract this. So I am not going to try it.

StefanR5R · Jan 27, 2025

A small update. After I came back from the dayjob, the 9554P presents itself in a somewhat better light:
still with CPU affinities, still with four 32t slots,
now with projects 12420 (3×) and 12421 (1×):
estimated 0.8…0.9 M each = 3.4 M PPD for ≈440 W at the wall,
CPU runs at 3.7 GHz on average, CPU load is 4× ≈3200%

These jobs take about half a day each. That's about the same as these projects took @3×42t without affinity, but now there are four at once.

StefanR5R · Feb 1, 2025

Another test on the 9554P:
eight 16t slots,
CPU affinities: 1 FAHCore = 1 core complex
estimated ≈4.0 M PPD for ≈465 W at the wall,
CPU runs at 3.3 GHz on average, CPU load is 8× ≈1600%

the currently running projects:
slot 00 unit 00: RUNNING project 19228, 19.30% done, ETA 03:08:00, 469 k ppd
slot 01 unit 01: RUNNING project 19229, 16.75% done, ETA 03:44:00, 484 k ppd
slot 02 unit 02: RUNNING project 18497, 13.90% done, ETA 04:41:00, 515 k ppd
slot 03 unit 03: RUNNING project 12420, 4.78% done, ETA 14:59:00, 548 k ppd
slot 04 unit 04: RUNNING project 12423, 4.82% done, ETA 14:49:00, 557 k ppd
slot 05 unit 05: RUNNING project 12465, 15.18% done, ETA 04:13:00, 501 k ppd
slot 06 unit 06: RUNNING project 19227, 18.61% done, ETA 03:15:00, 460 k ppd
slot 07 unit 07: RUNNING project 19227, 18.57% done, ETA 03:16:00, 456 k ppd

Conclusion: FahCore_a8 doesn't like high thread counts per task, nor does it like to be spread across AMD's last-level cache boundaries.¹ That's quite typical for OpenMP auto-parallelized computing cores.

________
¹) The 4×32t config of post #86 had affinities of 1 FahCore = 2 CCXs.

cellarnoise · Feb 1, 2025

StefanR5R said:
Another test on the 9554P:
eight 16t slots,
CPU affinities: 1 FAHCore = 1 core complex
estimated ≈4.0 M PPD for ≈465 W at the wall,
CPU runs at 3.3 GHz on average, CPU load is 8× ≈1600%

the currently running projects:
slot 00 unit 00: RUNNING project 19228, 19.30% done, ETA 03:08:00, 469 k ppd
slot 01 unit 01: RUNNING project 19229, 16.75% done, ETA 03:44:00, 484 k ppd
slot 02 unit 02: RUNNING project 18497, 13.90% done, ETA 04:41:00, 515 k ppd
slot 03 unit 03: RUNNING project 12420, 4.78% done, ETA 14:59:00, 548 k ppd
slot 04 unit 04: RUNNING project 12423, 4.82% done, ETA 14:49:00, 557 k ppd
slot 05 unit 05: RUNNING project 12465, 15.18% done, ETA 04:13:00, 501 k ppd
slot 06 unit 06: RUNNING project 19227, 18.61% done, ETA 03:15:00, 460 k ppd
slot 07 unit 07: RUNNING project 19227, 18.57% done, ETA 03:16:00, 456 k ppd

Conclusion: FahCore_a8 doesn't like high thread counts per task, nor does it like to be spread across AMD's last-level cache boundaries.¹ That's quite typical for OpenMP auto-parallelized computing cores.

________
¹) The 4×32t config of post #86 had affinities of 1 FahCore = 2 CCXs.

Thanks for this crazy amount of cpu testing! I think this also shows how cpus are not appreciated at F@H!

StefanR5R · Feb 2, 2025

For posterity: TeAm points for the month.
January 2024 — 26,898,347,599
January 2025 — 26,116,935,661 — 3% short of a new record

Total TeAm points during our annual Holiday Races:
1/1 2023 - 1/21 2023 — 8,645,799,517 ¹
December 2021 — 9,812,355,444
December 2020 — 8,022,546,283
December 2019 — 3,062,473,110
December 2018 — 1,864,317,276
________
¹) reconstructed from weekly DC stats and the bigadv EOL challenge:
1/2 2023 - 2/1 2023 —11,892,724,498

StefanR5R · Feb 2, 2025

And one more thing about F@H on CPUs:

I performed the above tests with FAHClient v7. This client gives full control over how many "folding slots" there are, and how many logical CPUs or which GPU is to be used by each "folding slot". On top of this fine-grained control, I "lasso'ed" folding slots to particular logical CPUs on my EPYC host, which reduced (or in the final test: eliminated) cross-CCX memory accesses.

Now I just came across the following note about FAHClient changes from v7 to v8 via https://foldingathome.org/guides/:

the concept of folding slots in v7 has changed in v8. Instead of configuring slots, you only have to tell Folding@home which compute resources (e.g. CPUs and GPUs) you would like it to use. It will then automatically allocate those resources in the most efficient way. This change both simplifies the setup of Folding@home and makes it possible for Folding@home to allocate multiple CPUs and GPUs to the same Work Unit. By allocating more resources to a single WU, Folding@home can decrease simulation times and achieve scientific results more quickly.

I haven't tested the v8 client yet. This is what I saw from browsing the source code:

The number of logical CPUs which the client will utilize for F@H cores is the minimum of
- the total number of logical CPUs of the computer minus 1,
- the number of "performance CPUs" of the computer if there are any,
- the number of logical CPUs configured by the user.
When an assignment server assigns a workunit to the client, the client seems to get told how many CPUs the workunit needs at least and how many CPUs the workunit can use at most.
The client then seems to launch this WU with as many threads as are not yet utilized (if there are enough as yet unused logical CPUs left on the host), but with no more threads than the workunit claims to be able to use. If there are not enough unused CPUs, the workunit is apparently kept waiting until there are.
I wonder if this is all safe from CPU WUs possibly halting GPU WUs.
As for CPU WUs on a large host, you will obviously end up with weird mixes of threadcounts per WU, and likely higher threadcounts than is good for throughput.

I don't know if it is possible with the v8 client (without source code modifications) to run multiple instances of it on a large host, if one wanted to optimize throughput on that host. But a better way than multiple instances might be to modify the client source code to add a configuration option which limits the maximum thread count per workunit, overriding what the assignment server sets as maximum for the WU.

TennesseeTony · Feb 2, 2025

StefanR5R said:
I wonder if this is all safe from CPU WUs possibly halting GPU WUs.

I've used v8 a little, and straight "out of the box" it assigned one thread to GPU.

v8 "makes it possible ... to allocate multiple ... GPUs to the same Work Unit... and achieve ... results more quickly." Hmmm, very interesting. Might have to test that.

Pokey · Feb 2, 2025

For what it's worth, I ran Ver 8 on my Rome 64/128 rig for a while and it automatically divided the threads up this way:
GPU: 1
CPU: 64
CPU 63

I did not pay any attention or take note of the times involved.

Endgame124 · Feb 3, 2025

Stefan was kind enough to put the month totals above. We didn't quite hit the totals from 2024, but we did pretty well. I'm thinking 2026 is looking pretty good to beat the 2024 numbers as people upgrade to 5000 series hardware.

2025 Numbers:
Daily January Peak (1/31/2025):
1,293,656,958

Week Peak (Week of 1/26/25 to 2/1/25):
6,936,348,956

Jan 2025 Total
26,116,935,661

Peak month active Users (1/6/2025):
64

Current team peaks:
Daily Peak (1/25/2024):
1,666,901,932 points

Week Peak (week of 1/21/24)
7,654,446,4125 points

Monthly Peak (Month of Jan 2024)
26,898,347,599 points

Peak month active Users (1/23/2024):
83
Peak Active Users (April 19th, 2020):
1481

TennesseeTony · Feb 3, 2025

Thanks @Endgame124

F@H January TeAm record run?

StefanR5R

Elite Member

In2Photos

Platinum Member

cellarnoise

Senior member

StefanR5R

Elite Member

StefanR5R

Elite Member

IEC

Elite Member

StefanR5R

Elite Member

StefanR5R

Elite Member

IEC

Elite Member

StefanR5R

Elite Member

StefanR5R

Elite Member

StefanR5R

Elite Member

cellarnoise

Senior member

StefanR5R

Elite Member

StefanR5R

Elite Member

TennesseeTony

Elite Member

Pokey

Platinum Member

Endgame124

Senior member

TennesseeTony

Elite Member

TRENDING THREADS