PrimeGrid Challenges 2023

emoga · Aug 25, 2023

Over 18 days until the next competition and the team is already testing out SoB workunits to fine-tune their systems.

Times have sure changed...we used to struggle against the Noobs Of Kryta

Markfw · Aug 25, 2023

emoga said:
Over 18 days until the next competition and the team is already testing out SoB workunits to fine-tune their systems.

Times have sure changed...we used to struggle against the Noobs Of Kryta View attachment 84879

With Zen 4 supporting avx-512, and a lot of our team has Zen 4, that alone dooms most competitors ! And 2 (soon to be 3) 9554's and a 9654 in the mix, just adds to their defeat !

Markfw · Aug 26, 2023

@crashtech , I do have a 9554 on linux. Can you explain (easily to an idiot) how to set affinity in linux ?

StefanR5R · Aug 26, 2023

A while ago I posted a script for Linux in the private section of teamanandtech.org.

On single socket computers:
Edit the top of the script to define the desired config, as described in the inline comments of the script. Then keep the script running in the background (e.g. in an extra terminal window which you leave open or minimized).

On dual socket computers:
The same. Though for best results, one might combine the script with setting up two boinc client instances; one boinc bound to all logical CPUs of one socket, the other one bound to all logical CPUs of the other socket, each boinc starting only as many tasks at once as fit on a single socket of course.

The script can also be run as a system service, but I haven't written up a copy+paste recipe for this yet.

Markfw · Aug 26, 2023

StefanR5R said:
A while ago I posted a script for Linux in the private section of teamanandtech.org.

On single socket computers:
Edit the top of the script to define the desired config, as described in the inline comments of the script. Then keep the script running in the background (e.g. in an extra terminal window which you leave open or minimized).

On dual socket computers:
The same. Though for best results, one might combine the script with setting up two boinc client instances; one boinc bound to all logical CPUs of one socket, the other one bound to all logical CPUs of the other socket, each boinc starting only as many tasks at once as fit on a single socket of course.

The script can also be run as a system service, but I haven't written up a copy+paste recipe for this yet.

I just looked over there. I can't find it, can you link it for me please ?

crashtech · Aug 27, 2023

I've been setting mine up manually using taskset, but there are better ways to be learned, especially for high core count CPUs.

TennesseeTony · Aug 27, 2023

emoga said:
Over 18 days until the next competition and the team is already testing out SoB workunits to fine-tune their systems.

I ran a few tests already as well. 7950x with the smaller cache runs about 14.5 hours with all 32 threads. 4x8threads didn't do so well. I should try 2x16 threads, I suppose. Tasks score upwards of 107,000 points each.

crashtech · Aug 27, 2023

TennesseeTony said:
I ran a few tests already as well. 7950x with the smaller cache runs about 14.5 hours with all 32 threads. 4x8threads didn't do so well. I should try 2x16 threads, I suppose. Tasks score upwards of 107,000 points each.

My tests show that SMT is no help, but it doesn't hurt a whole lot if you don't want to turn it off in the BIOS or play with process affinity. Best result I got was 2 tasks of 8 threads, pinned to physical cores. The 7950X is faster than my twin Xeon 2696v4's. 16 Zen4 cores beat 44 Broadwell cores...

waffleironhead · Aug 27, 2023

7940hs was 16.5 hours for an 8 thread unit.
3700x was 43.22 hours.
5700g was 40.8 hours.
That avx 512 really shines.

StefanR5R · Aug 28, 2023

crashtech said:
I've been setting mine up manually using taskset, but there are better ways to be learned, especially for high core count CPUs.

Plug: My script for Linux is trivial to set up and run. (Just remember the basic things like setting the executable bit, which can be done in any graphical file manager.) You can safely ignore the size of the script which came out somewhat large; this is due to small features which I wanted to have in there but neither matter for, nor are interfering with, a more basic operation on Ryzens and the likes.

I still might reduce some clutter in the configuration section of the script if I find spare time for that.

crashtech said:
My tests show that SMT is no help, but it doesn't hurt a whole lot if you don't want to turn it off in the BIOS or play with process affinity.

Setting processor affinities not only helps with optimum use of SMT, it also importantly reduces data transfers across cache boundaries — on Zen CPUs which have more than one CCX. This traffic costs time and wastes energy which would be better spent in the FMA units.

Windows users can deal with this by means of Process Lasso (I guess you'd want to ask @Markfw how to set this up; I haven't used it myself yet) or with a small Windows script which is very easy to set up (look at the same teamanandtech.org thread at which I posted the Linux script).

crashtech · Aug 28, 2023

Stefan is right of course, even when utilizing HT/SMT it's best to confine each task to a CCX, and at that point it's also best to just confine them to physical cores as well.

Markfw · Sep 1, 2023

OK, I could not stand it anymore, looking at that "not used" 9554, so I ordered motherboard and memory for a complete system, so, I will have 3 9554 (64 core) 1 9654 (96 CORE) AND 5 7950X and all but one system will have affinity set (the 7950x3d system) for the PG run in 11 days.No, I have another 7950x3d that may not have affinity set. I have to look into a script.

StefanR5R · Sep 3, 2023

On a Ryzen 9 7950X3D, you could possibly run 1 8-threaded task on the standard CCD and either 2 4-threaded tasks or 3 5-threaded tasks on the cache-enhanced CCD. But whether or not such configs improve host throughput, and if yes, if it's worth the complications of such setups compared to a straightforward 2x8 config, is not obvious.

crashtech · Sep 3, 2023

I wonder what the output of lscpu -C looks like on a 7950X3D.

Markfw · Sep 3, 2023

crashtech said:
I wonder what the output of lscpu -C looks like on a 7950X3D.

login as: mark
mark@192.168.1.31's password:
Last login: Sun Aug 13 17:01:17 2023 from 192.168.1.17
mark@7950x3d:~$ lscpu -C
NAME ONE-SIZE ALL-SIZE WAYS TYPE LEVEL
L1d 32K 512K 8 Data 1
L1i 32K 512K 8 Instruction 1
L2 1M 16M 8 Unified 2
L3 96M 192M 16 Unified 3
mark@7950x3d:~$

@crashtech what does that tell you that you were interested in ?

crashtech · Sep 3, 2023

No, maybe that's not the right command. Knowing which cores are attached to the big cache is what's needed.

Markfw · Sep 3, 2023

crashtech said:
No, maybe that's not the right command. Knowing which cores are attached to the big cache is what's needed.

I did - -h and this is what I got for options:

StefanR5R · Sep 4, 2023

Seems like lscpu assumes that all caches at a given level have the same size. Which is of course no longer true for a few CPUs, among them AMD's dual-CCD Ryzen X3D. I had a quick look at the mainline code repository, https://github.com/util-linux/util-linux/tree/master/sys-utils, and I haven't seen any respective code update at first glance.

Does cat /proc/cpuinfo show the two differently sized L3 caches perhaps? (Which would depend on respectively extended kernel code. I have no idea if anybody cared to implement this, and if yes, in which kernel versions.)

Markfw · Sep 4, 2023

StefanR5R said:
Seems like lscpu assumes that all caches at a given level have the same size. Which is of course no longer true for a few CPUs, among them AMD's dual-CCD Ryzen X3D. I had a quick look at the mainline code repository, https://github.com/util-linux/util-linux/tree/master/sys-utils, and I haven't seen any respective code update at first glance.

Does cat /proc/cpuinfo show the two differently sized L3 caches perhaps? (Which would depend on respectively extended kernel code. I have no idea if anybody cared to implement this, and if yes, in which kernel versions.)

looking at "cache size" they all say 1024

crashtech · Sep 4, 2023

One possible way to determine which cores are attached to the big cache is to run the CPU version of GFN-21 using the same settings as for the upcoming SoB challenge. GFN-21 will overfill the 32MB cache on one of the CCDs, which should result in slower completion times. Once it's determined which task is running quickly, you could suspend them one at a time to see which one is running on which cores.

Skillz · Sep 4, 2023

I'm pretty sure the lower numbers (IE: The first one) is the 3D V-Cache and the higher numbers (IE: The second one) is the normal one.

Markfw · Sep 9, 2023

OK, fired up my newest box for the SOB contest. Its not down to 17 hours yet. Tomorrow, I am going to fire up an identical 9554, but on linux using stefan's scripts. If its better time than lasso, I am installing linux beside windows on the new box. This will be a great test of the 2 ways, one, windows with lasso, one, linux with a script !

StefanR5R · Sep 10, 2023

You will need to compute credits/time for such a test with random workunits; just a comparison of times will be unreliable due to size variations between workunits.

When you went as far as pinning tasks to sets of logical CPUs on both operating systems, remaining influences of the OS on the performance of LLR should be negligible, and be lower than hardware variations between processor specimens.

StefanR5R · Sep 10, 2023

StefanR5R said:
When you went as far as pinning tasks to sets of logical CPUs on both operating systems, remaining influences of the OS on the performance of LLR should be negligible,

Hmm, on second thought, this is only true if

the same sets of hardware threads are being used for LLR on both systems (which is complicated by the fact that the mapping of logical CPU IDs to hardware threads differs between these two OSs, as far as I have heard),
background load, such as system daemons, virus scanners, graphical desktop environments etc., remains low on both systems,
both operating system kernels are intelligent enough to spread said background load across hardware threads which are not used by the LLR instances.

Furthermore, from what I recall, it used to be the case that PrimeGrid's application binaries for Windows and Linux had been compiled without noticeably different optimizations, and I merely assume that this is still true.

StefanR5R said:
You will need to compute credits/time for such a test with random workunits; just a comparison of times will be unreliable due to size variations between workunits.

PS, once you reported a result, you may have to wait some time until validation. But until then, you already can look up the pending credit of your completed tasks on your user account page at the PrimeGrid web site.

Markfw · Sep 11, 2023

Well, a day and a half to go, and only 3 computers to setup to get ready for this challenge. Once set, all I have to do is wake up at 4 am (I am up most of the night anyway) and turn off the "no more work" on those computers. 5 7950x's, 3 9554 Genoas, and a 9654 Genoa. All except the 2 windows boxes will have linux affinity set. Config for 7950x's are 2 8 thread units, and 9554's are 8 8 thread units, and the 9654 is 12 instances of 8 threads each.

PrimeGrid Challenges 2023

Senior member

Moderator Emeritus, Elite Member

Moderator Emeritus, Elite Member

Elite Member

Moderator Emeritus, Elite Member

Lifer

Elite Member

Lifer

Diamond Member

Elite Member

Lifer

Moderator Emeritus, Elite Member

Elite Member

Lifer

Moderator Emeritus, Elite Member

Lifer

Moderator Emeritus, Elite Member

Elite Member

Moderator Emeritus, Elite Member

Lifer

Golden Member

Moderator Emeritus, Elite Member

Elite Member

Elite Member

Moderator Emeritus, Elite Member