PrimeGrid Challenges 2023

Page 12 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,389
15,513
136
Over 18 days until the next competition and the team is already testing out SoB workunits to fine-tune their systems.

Times have sure changed...we used to struggle against the Noobs Of Kryta View attachment 84879
With Zen 4 supporting avx-512, and a lot of our team has Zen 4, that alone dooms most competitors ! And 2 (soon to be 3) 9554's and a 9654 in the mix, just adds to their defeat !
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,389
15,513
136
@crashtech , I do have a 9554 on linux. Can you explain (easily to an idiot) how to set affinity in linux ?
 

StefanR5R

Elite Member
Dec 10, 2016
6,057
9,107
136
A while ago I posted a script for Linux in the private section of teamanandtech.org.

On single socket computers:
Edit the top of the script to define the desired config, as described in the inline comments of the script. Then keep the script running in the background (e.g. in an extra terminal window which you leave open or minimized).

On dual socket computers:
The same. Though for best results, one might combine the script with setting up two boinc client instances; one boinc bound to all logical CPUs of one socket, the other one bound to all logical CPUs of the other socket, each boinc starting only as many tasks at once as fit on a single socket of course.

The script can also be run as a system service, but I haven't written up a copy+paste recipe for this yet.
 
Reactions: biodoc

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,389
15,513
136
A while ago I posted a script for Linux in the private section of teamanandtech.org.

On single socket computers:
Edit the top of the script to define the desired config, as described in the inline comments of the script. Then keep the script running in the background (e.g. in an extra terminal window which you leave open or minimized).

On dual socket computers:
The same. Though for best results, one might combine the script with setting up two boinc client instances; one boinc bound to all logical CPUs of one socket, the other one bound to all logical CPUs of the other socket, each boinc starting only as many tasks at once as fit on a single socket of course.

The script can also be run as a system service, but I haven't written up a copy+paste recipe for this yet.
I just looked over there. I can't find it, can you link it for me please ?
 

crashtech

Lifer
Jan 4, 2013
10,596
2,161
146
I've been setting mine up manually using taskset, but there are better ways to be learned, especially for high core count CPUs.
 

TennesseeTony

Elite Member
Aug 2, 2003
4,238
3,671
136
www.google.com
Over 18 days until the next competition and the team is already testing out SoB workunits to fine-tune their systems.

I ran a few tests already as well. 7950x with the smaller cache runs about 14.5 hours with all 32 threads. 4x8threads didn't do so well. I should try 2x16 threads, I suppose. Tasks score upwards of 107,000 points each.
 

crashtech

Lifer
Jan 4, 2013
10,596
2,161
146
I ran a few tests already as well. 7950x with the smaller cache runs about 14.5 hours with all 32 threads. 4x8threads didn't do so well. I should try 2x16 threads, I suppose. Tasks score upwards of 107,000 points each.
My tests show that SMT is no help, but it doesn't hurt a whole lot if you don't want to turn it off in the BIOS or play with process affinity. Best result I got was 2 tasks of 8 threads, pinned to physical cores. The 7950X is faster than my twin Xeon 2696v4's. 16 Zen4 cores beat 44 Broadwell cores...
 

waffleironhead

Diamond Member
Aug 10, 2005
6,974
463
136
7940hs was 16.5 hours for an 8 thread unit.
3700x was 43.22 hours.
5700g was 40.8 hours.
That avx 512 really shines.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
6,057
9,107
136
I've been setting mine up manually using taskset, but there are better ways to be learned, especially for high core count CPUs.
Plug: My script for Linux is trivial to set up and run. (Just remember the basic things like setting the executable bit, which can be done in any graphical file manager.) You can safely ignore the size of the script which came out somewhat large; this is due to small features which I wanted to have in there but neither matter for, nor are interfering with, a more basic operation on Ryzens and the likes.

I still might reduce some clutter in the configuration section of the script if I find spare time for that.

My tests show that SMT is no help, but it doesn't hurt a whole lot if you don't want to turn it off in the BIOS or play with process affinity.
Setting processor affinities not only helps with optimum use of SMT, it also importantly reduces data transfers across cache boundaries — on Zen CPUs which have more than one CCX. This traffic costs time and wastes energy which would be better spent in the FMA units.

Windows users can deal with this by means of Process Lasso (I guess you'd want to ask @Markfw how to set this up; I haven't used it myself yet) or with a small Windows script which is very easy to set up (look at the same teamanandtech.org thread at which I posted the Linux script).
 
Last edited:
Reactions: crashtech

crashtech

Lifer
Jan 4, 2013
10,596
2,161
146
Stefan is right of course, even when utilizing HT/SMT it's best to confine each task to a CCX, and at that point it's also best to just confine them to physical cores as well.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,389
15,513
136
OK, I could not stand it anymore, looking at that "not used" 9554, so I ordered motherboard and memory for a complete system, so, I will have 3 9554 (64 core) 1 9654 (96 CORE) AND 5 7950X and all but one system will have affinity set (the 7950x3d system) for the PG run in 11 days.No, I have another 7950x3d that may not have affinity set. I have to look into a script.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
6,057
9,107
136
On a Ryzen 9 7950X3D, you could possibly run 1 8-threaded task on the standard CCD and either 2 4-threaded tasks or 3 5-threaded tasks on the cache-enhanced CCD. But whether or not such configs improve host throughput, and if yes, if it's worth the complications of such setups compared to a straightforward 2x8 config, is not obvious.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,389
15,513
136
I wonder what the output of lscpu -C looks like on a 7950X3D.
login as: mark
mark@192.168.1.31's password:
Last login: Sun Aug 13 17:01:17 2023 from 192.168.1.17
mark@7950x3d:~$ lscpu -C
NAME ONE-SIZE ALL-SIZE WAYS TYPE LEVEL
L1d 32K 512K 8 Data 1
L1i 32K 512K 8 Instruction 1
L2 1M 16M 8 Unified 2
L3 96M 192M 16 Unified 3
mark@7950x3d:~$

@crashtech what does that tell you that you were interested in ?
 
Last edited:

crashtech

Lifer
Jan 4, 2013
10,596
2,161
146
No, maybe that's not the right command. Knowing which cores are attached to the big cache is what's needed.
 

StefanR5R

Elite Member
Dec 10, 2016
6,057
9,107
136
Seems like lscpu assumes that all caches at a given level have the same size. Which is of course no longer true for a few CPUs, among them AMD's dual-CCD Ryzen X3D. I had a quick look at the mainline code repository, https://github.com/util-linux/util-linux/tree/master/sys-utils, and I haven't seen any respective code update at first glance.

Does cat /proc/cpuinfo show the two differently sized L3 caches perhaps? (Which would depend on respectively extended kernel code. I have no idea if anybody cared to implement this, and if yes, in which kernel versions.)
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,389
15,513
136
Seems like lscpu assumes that all caches at a given level have the same size. Which is of course no longer true for a few CPUs, among them AMD's dual-CCD Ryzen X3D. I had a quick look at the mainline code repository, https://github.com/util-linux/util-linux/tree/master/sys-utils, and I haven't seen any respective code update at first glance.

Does cat /proc/cpuinfo show the two differently sized L3 caches perhaps? (Which would depend on respectively extended kernel code. I have no idea if anybody cared to implement this, and if yes, in which kernel versions.)
looking at "cache size" they all say 1024
 

crashtech

Lifer
Jan 4, 2013
10,596
2,161
146
One possible way to determine which cores are attached to the big cache is to run the CPU version of GFN-21 using the same settings as for the upcoming SoB challenge. GFN-21 will overfill the 32MB cache on one of the CCDs, which should result in slower completion times. Once it's determined which task is running quickly, you could suspend them one at a time to see which one is running on which cores.
 

Skillz

Senior member
Feb 14, 2014
970
999
136
I'm pretty sure the lower numbers (IE: The first one) is the 3D V-Cache and the higher numbers (IE: The second one) is the normal one.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,389
15,513
136
OK, fired up my newest box for the SOB contest. Its not down to 17 hours yet. Tomorrow, I am going to fire up an identical 9554, but on linux using stefan's scripts. If its better time than lasso, I am installing linux beside windows on the new box. This will be a great test of the 2 ways, one, windows with lasso, one, linux with a script !
 

StefanR5R

Elite Member
Dec 10, 2016
6,057
9,107
136
You will need to compute credits/time for such a test with random workunits; just a comparison of times will be unreliable due to size variations between workunits.

When you went as far as pinning tasks to sets of logical CPUs on both operating systems, remaining influences of the OS on the performance of LLR should be negligible, and be lower than hardware variations between processor specimens.
 

StefanR5R

Elite Member
Dec 10, 2016
6,057
9,107
136
When you went as far as pinning tasks to sets of logical CPUs on both operating systems, remaining influences of the OS on the performance of LLR should be negligible,
Hmm, on second thought, this is only true if
  • the same sets of hardware threads are being used for LLR on both systems (which is complicated by the fact that the mapping of logical CPU IDs to hardware threads differs between these two OSs, as far as I have heard),
  • background load, such as system daemons, virus scanners, graphical desktop environments etc., remains low on both systems,
  • both operating system kernels are intelligent enough to spread said background load across hardware threads which are not used by the LLR instances.

Furthermore, from what I recall, it used to be the case that PrimeGrid's application binaries for Windows and Linux had been compiled without noticeably different optimizations, and I merely assume that this is still true.

You will need to compute credits/time for such a test with random workunits; just a comparison of times will be unreliable due to size variations between workunits.
PS, once you reported a result, you may have to wait some time until validation. But until then, you already can look up the pending credit of your completed tasks on your user account page at the PrimeGrid web site.
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,389
15,513
136
Well, a day and a half to go, and only 3 computers to setup to get ready for this challenge. Once set, all I have to do is wake up at 4 am (I am up most of the night anyway) and turn off the "no more work" on those computers. 5 7950x's, 3 9554 Genoas, and a 9654 Genoa. All except the 2 windows boxes will have linux affinity set. Config for 7950x's are 2 8 thread units, and 9554's are 8 8 thread units, and the 9654 is 12 instances of 8 threads each.
 
Reactions: Orange Kid
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |