PrimeGrid Challenges 2022

crashtech · Mar 15, 2022

I got confused by the way they list it as well. The link points to the other half of the task, it's not a label for the task in your list.

Markfw · Mar 16, 2022

I upped my contribution al little. 128 threads, 8 jobs running on one of my 7742's. 10 to 18 jobs total.

StefanR5R · Mar 20, 2022

Some random things which I found out in the meantime:

Current 3*2^n+1 tasks are a little bit faster than current 3*2^n-1 tasks, yet the former give a little more credit than the latter. Let's hope there won't be cherrypicking.

On Broadwell-EP and on Rome, the use of HT/SMT brings a tiny improvement in throughput at a small loss in power efficiency (IOW at an expense of Joule per task).

The way I set up my computers, Rome has got >2.5 times the performance per Watt of Broadwell-EP, and 1.3 times the performance per core and GHz.
Edit: To be fair though, my setup on Rome is more complicated and less flexible than on Broadwell-EP.

On Rome with 4c/CCX, twice as many concurrent tasks as there are CCXes gives best throughput if the tasks are scheduled freely by Linux. But if affinity to logical CPUs is controlled by the user, a 1:1 match of concurrent tasks with CCXes gives best — or, better than best :-) — throughput.
Edit 2: A good compromise between these two cases is available with server BIOSes via the "ACPI SRAT L3 Cache As NUMA Domain" tuning option. Works better with LLR-321 than I recall from tests with another LLR based subproject; I don't recall which one.

Markfw · Mar 20, 2022

StefanR5R said:
Some random things which I found out in the meantime:

Current 3*2^n+1 tasks are a little bit faster than current 3*2^n-1 tasks, yet the former give a little more credit than the latter. Let's hope there won't be cherrypicking.

On Broadwell-EP and on Rome, the use of HT/SMT brings a tiny improvement in throughput at a small loss in power efficiency (IOW at an expense of Joule per task).

The way I set up my computers, Rome has got >2.5 times the performance per Watt of Broadwell-EP, and 1.3 times the performance per core and GHz.
Edit: To be fair though, my setup on Rome is more complicated and less flexible than on Broadwell-EP.

On Rome with 4c/CCX, twice as many concurrent tasks as there are CCXes gives best throughput if the tasks are scheduled freely by Linux. But if affinity to logical CPUs is controlled by the user, a 1:1 match of concurrent tasks with CCXes gives best — or, better than best :-) — throughput.
Edit 2: A good compromise between these two cases is available with server BIOSes via the "ACPI SRAT L3 Cache As NUMA Domain" tuning option. Works better with LLR-321 than I recall from tests with another LLR based subproject; I don't recall which one.

I have my 7452 doing 3 tasks and their ETA is 4:45, thats 48 tasks for 64 available.
First, has it started yet ? Second for a 7742, are you saying 8 tasks ? it was like 9 hours per task then. I changed to 75% and 6 tasks and now its 6 hours eta per task. What do you think ? When does the contest start ?

mmonnin03 · Mar 20, 2022

"beginning 21 March 03:21 UTC and ending 26 March 03:21 UTC"

That's 11:21 EST on the 20th. ~3hr from now

Running out some other work on my PCs. I'll be running 4 threads per task. I tried a single task on some PCs at 8x and they only got to around 7.2 -7.3 threads. I'll prob change it to 8 when it gets closer to end of the challenge.

Ken g6 · Mar 20, 2022

It has begun! I'm a little slow getting started, but not by much.

Markfw · Mar 21, 2022

Ken g6 said:
It has begun! I'm a little slow getting started, but not by much.

Not sure if I am configured correctly, but I am in.

cellarnoise · Mar 21, 2022

My wife won't cut my COVID hair, but I now have my wired Ethernet working again... Can the Eth cut hair?

I'm in on the challenge at this point.

StefanR5R · Mar 21, 2022

StefanR5R said:
As a reminder, for points in this challenge, the 321-LLR subproject needs to be selected (CPUs only, with multithreading support),
work must be downloaded after Monday March 21, 03:21 UTC = today, Sunday March 20, 23:21 EDT, 20:21 PDT,
and results reported before Saturday March 26, 03:21 UTC = Friday March 25, 23:21 EDT, 20:21 PDT.

If you have work in the buffer which was downloaded earlier than that, just abort this work and update the project.

cellarnoise · Mar 21, 2022

StefanR5R said:
If you have work in the buffer which was downloaded earlier than that, just abort this work and update the project.

Yes!
Come one and come all . Thanks Stef!!!
Edit 1... D.c. or Fear? Somehow the googliees dropped this last edit from the 1st post.. likely because of M.L... nice attempt...
Haha!

StefanR5R · Mar 21, 2022

Markfw said:
I have my 7452 doing 3 tasks and their ETA is 4:45, thats 48 tasks for 64 available.
First, has it started yet ? Second for a 7742, are you saying 8 tasks ? it was like 9 hours per task then. I changed to 75% and 6 tasks and now its 6 hours eta per task. What do you think ?

I took measurements on a dual-7452. These measurements will be representative for a single-7452 as well. (The dual-7452 spends a part of the energy budget for the Infinity Fabric link between the two sockets, but it should not be much because inter-socket traffic is low in this sort of workload. Also, I increased PPT and TDP to 7452's possible maximum of 180 W in the BIOS, default is 155 W.)

I performed the measurements with a fixed workunit, in a scripted testbed. That is, the workloads consisted of multiple tasks from the same workunit running in parallel, and the same workunit being re-used in all test scenarios with different parallel task count and thread count. The consequence is that these tests are very precise, repeatable, and quick.

In contrast, observations of random workunits coming from PrimeGrid are not as conclusive, especially because there are two types of "main tasks" which have not only different durations but also different PPD, as I mentioned in #153.

Which particular tests I ran and what the results were is posted in a private section of the teamanandtech.org forum. However, I spilled the beans in #153 already.

One 7452 has got 8 CCXs, each CCX made up of 4c/8t and 16 MB L3$. Here is how I am configuring them for the challenge:

In the BIOS, "Advanced" --> "ACPI Settings", I am switching "ACPI SRAT L3 Cache As NUMA Domain" from "Auto" to "Enabled". (That's how it is labeled in Supermicro's AMI BIOS.)

The effect of this is that the firmware will present each CCX as a NUMA node to the operating system. A NUMA aware OS like Linux will attempt to keep multithreaded processes within a NUMA node. Declaring a CCX a NUMA node is a bit of a hack as a replacement for cache-aware scheduling. The latter is more problematic than NUMA-aware scheduling, and I am not sure whether cache topology plays a role at all in current Linux scheduler decisions. (There is a related scheduler change in Linux 5.16 which I mentioned elsewhere, but this shouldn't affect all-core loads.)
In the output of the "lscpu -e" command, the NODE column will show the effect of this change. A dual-7452 system will then have the nodes 0…15 instead of nodes 0…1.

For the first 4+ days of the 5 challenge days, I will run 8 tasks in parallel on each 7452 (that is, 16 tasks at once on a dual-7452 computer) = a 1:1 ratio of # tasks to # CCXs.
- I choose to let each task have 4 program threads.
- However, the CPU could of course give 8 hardware threads to each task, since there are 8 threads in each CCX. And indeed, using all threads would increase performance by a small percentage. It would also increase power draw but more than proportionally.
- So, since EPYC Rome isn't a power hog in the first place, you might prefer to go with 8 threads per task and spend that little bit of extra electric energy.

During the last day, if I find the time, I may switch to fewer tasks at once combined with more program threads per task. This will sacrifice throughput but decrease run times. That way, the last hours of the challenge will be better filled out.

________
Edit: There is one thing though which I haven't considered yet at all. It's whether @cellarnoise is to be classified as impish or as admirable.

cellarnoise · Mar 21, 2022

May we all break the ribs of many deserving pigs during this challenge! And may the pigs be well that are also deserving and that don't starve the cpus to the point of reboot...

Merry Crunching all!

mmonnin03 · Mar 21, 2022

TAAT in 1st, Ukraine in 4th.

Challenge statistics

www.primegrid.com

Top users

Challenge statistics

www.primegrid.com

Free-DC 321 only but includes all 321 tasks if downloaded before the challenge start

Primegrid Project Stats

stats3.free-dc.org

SystemVipers · Mar 21, 2022

I for one am very confuzed by this wu, but i did manage some points.
Heading out now, but tonight I hope to try to figure out how to squeeze more out of the rigs.

I'm here

Enjoy the day
SV

mmonnin03 · Mar 21, 2022

My Proof tasks are ~8.6-9.1k credit, which are more plentiful. And the main tasks are much quicker at around 70 credits. Looks like you have completed 5 proof tasks.

Edit: Thats backwards, it should be:

My main tasks are ~8.6-9.1k credit, which are more plentiful. And the proof tasks are much quicker at around 70 credits. Looks like you have completed 5 main tasks.

StefanR5R · Mar 21, 2022

mmonnin03 said:
My Proof tasks are ~8.6-9.1k credit, which are more plentiful. And the main tasks are much quicker at around 70 credits.

It's the other way around. :-) "Main tasks" do the actual work. "Proof tasks" are a quick validation of that work.

In the results tables at the PrimeGrid web server, a row for a main task contains a link towards a corresponding proof task, as soon as that proof task was generated by the server. This link is prominently labeled with "[Proof task]". But the proof task is the one to which the link is directed, not the one where this link is given.

Vice versa, a table row for a proof task contains a backlink to the main task which is being validated in the proof. That backlink is labeled "[Main task]", because that's what this link is leading to, not where the link is placed. :-)

Workunit names of main tasks end in numbers, whereas workunit names of proof tasks end with the character c.

Main tasks, 3*2^n+1 form: >9,100 credits
Main tasks, 3*2^n-1 form: >8,700 credits
Proof tasks, 3*2^n+1 form: >71 credits
Proof tasks, 3*2^n-1 form: >68 credits

mmonnin03 · Mar 21, 2022

Ah yeah, that makes more sense with the proof being quicker.

Markfw · Mar 21, 2022

Something was wrong with my 7742, so I upgraded the kernal, and all other updates. Now they are 9 hour tasks. It was days before this. Still not very fast.

5950x and 12700 are still at 2 1/2 hours !

I added 2 more 5950x's since they work so well.

Edit: the 7742 tasks are at 24 hours now. Maybe I should just kill them and save the electricity $$ ??

Skillz · Mar 22, 2022

Markfw said:
Something was wrong with my 7742, so I upgraded the kernal, and all other updates. Now they are 9 hour tasks. It was days before this. Still not very fast.

5950x and 12700 are still at 2 1/2 hours !

I added 2 more 5950x's since they work so well.

Edit: the 7742 tasks are at 24 hours now. Maybe I should just kill them and save the electricity $$ ??

How many are you trying to run at once? You are probably running too many of them that it's having to swap L3 Cache and RAM.

waffleironhead · Mar 22, 2022

Just logging my unit times for posterity here.
my haswell cpu(4460) takes around 12.3 hours on 3 cores
my skylake cpu(6700) takes around 8.25 hours on 4 cores
my zen1 cpus(2400g) take around 13 hours on 4 cores
my zen 2 cpu (3700x) take around 7 hours on 4 cores
my zen 3 cpu(5700g) take around 3 hours on 8 cores

Ken g6 · Mar 22, 2022

Day 1 stats:

Rank___Credits____Username
6______1379530____crashtech
7______1174288____xii5ku
18_____610651_____emoga
25_____411857_____mmonnin
49_____214800_____Orange Kid
50_____214721_____cellarnoise2-TAAT
51_____213038_____Fardringle
64_____159832_____waffleironhead
118____70455______Skivelitis2
167____33655______markfw
175____27116______Ken_g6

Rank__Credits____Team
1_____5250521____Czech National Team
2_____5217851____Antarctic Crunchers
3_____4681357____Ukraine
4_____4509948____TeAm AnandTech
5_____3788772____SETI.Germany
6_____3398064____AMD Users
7_____2801125____Aggie The Pew

Probably not much commentary this time. I'm busy preparing to move!

Markfw · Mar 22, 2022

Skillz said:
How many are you trying to run at once? You are probably running too many of them that it's having to swap L3 Cache and RAM.

Running 2 on a 2970wx (24 cores) 6:15, running 3 on a 32 cores 7452 at 4.5, should be able to do 6 on a 64 core but its running 22 hours. I will try 50% load or 4 units and see what happens.

Only thing is its an ES chip and only 7 channels of memory.. The 7452 is retail, and so is the 2970wx.

StefanR5R · Mar 22, 2022

@Markfw, on EPYC Rome, enable the BIOS option which I mentioned. Then set the tasks to 8-threaded and use all logical CPUs (i.e. the SMT threads too).

Edit:
To reiterate, Zen 2 and Zen 3 CPUs loose a fair amount of throughput if multithreaded tasks run on more than one CCX per task.

If each CCX is turned into a NUMA domain by means of the mentioned BIOS option, and if the threadcount of each task is small enough to fit within the hardware thread count of a CCX, then Linux will schedule each task on CPUs which belong to a single CCX most of the time.

StefanR5R · Mar 22, 2022

If you want to change threads-per-task on the fly, create (or edit) projects/www.primegrid.com/app_config.xml and enter something like this:

XML:

<app_config>
    <app_version>
        <app_name>llr321</app_name>
        <plan_class>mt</plan_class>
        <!-- March 2022: 1M FFT length = 8 MB FFT data size -->
        <cmdline>-t 8</cmdline>
        <avg_ncpus>8</avg_ncpus>
    </app_version>
</app_config>

Change <cmdline> and <avg_ncpus> to your liking.
Then restart the client.
After the restart, the existing tasks will resume with the new setting.

(Instead of a client restart, you can also just suspend tasks to disk, then resume tasks. But then boincmgr will display misleading outdated values.)

StefanR5R · Mar 22, 2022

@Markfw, my dual-7452 with PPT and TDP at 180 W, "ACPI SRAT L3 Cache As NUMA Domain" = "Enabled", run 16 tasks at once per computer ( = 8 tasks at once per socket), 4 threads per task (i.e. I only use half of the logical CPUs to save a few Watts), and average duration of the latest >30 completed tasks are:

Main tasks, 3*2^n+1 form, >9,100 credits — 7.4 h
Main tasks, 3*2^n-1 form, >8,700 credits — 7.5 h

Edit,
according to my scripted tests prior to the challenge, the exact same setup except without the NUMA tweak would result in 28% longer runtimes, IOW throughput would be reduced to 78%.

Edit 2,
for reference, a 7452 has got 8 CCXs, each one with 4c/8t and 16 MB L3$.

PrimeGrid Challenges 2022

Lifer

Moderator Emeritus, Elite Member

Elite Member

Moderator Emeritus, Elite Member

Senior member

Programming Moderator, Elite Member

Moderator Emeritus, Elite Member

Senior member

Elite Member

Senior member

Elite Member

Senior member

Senior member

Member

Senior member

Elite Member

Senior member

Moderator Emeritus, Elite Member

Golden Member

Diamond Member

Programming Moderator, Elite Member

Moderator Emeritus, Elite Member

Elite Member

Elite Member

Elite Member