PrimeGrid Challenges 2022

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,738
14,770
136
OK, I was trying to clean the queue so I removed primegrid and was trying to re-add it. 4 times in a row it says "failed to add project . try again later" Ever seen this ?
 

Skillz

Senior member
Feb 14, 2014
946
979
136
What version of BOINC are you using? Sounds like its an outdated version. I had a similar issue, updating BOINC to the latest build fixed it for me.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,738
14,770
136
Download the newest version. I think it's 7.16.20 now, should fix it.
Thanks, I did and it fixed it. But DAMN, Microsoft takes forever to boot win 10 when you have been up for months without rebooting, and updated edge, which I did not want, and had to end task to get it off my screen.

And its been over 20 minutes, and NO tasks.....
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,738
14,770
136
I disabled the E cores, since this 321 does 16C at a time. The power curve went UP to 297 watts ! No video usage.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,738
14,770
136
OK, 300 watts for 8 cores. and no video at all come on... Who likes Alderlake ???? I really want to know what it offers. And default bios, except memory speed and disable ecores.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,738
14,770
136
Does this software use AVX-512 ? I heard that if you disable the e-cores, its will use avx-512. Since I have a big heatsink on it, it must be doing that ? Its just changed back to 162 watts, this is crazy.
 

StefanR5R

Elite Member
Dec 10, 2016
5,682
8,240
136
I disabled the E cores, since this 321 does 16C at a time. The power curve went UP to 297 watts !
From what I read, vendors of DIY PC mainboards tend to set insane default power limits in their BIOSes. Not sure what Intel's defaults are. But 300 W shouldn't be a sustained limit for such a little processor. (Arguably, even short-term limits should be a lot lot lower than that for sanity.)

Does this software use AVX-512 ?
It does if there is hardware support. Is it listed in the CPU capabilities? (lscpu on Linux; don't know about Windows, maybe CPU-Z.) Also, you can look into the BOINC data directory, "slots/?" subdirectory of a running task, "stderr.txt" file. The LLR exe logs which vector implementation it uses. After a task was finished and the result reported to the PrimeGrid server, you can view stderr.txt on the result page.¹

There is a catch: The AVX-512 port of LLR was made in order to be able to utilize both vector units of those Skylake-X/-SP CPUs which have two of those per core. It turned out that the FMA transform which was used by versions before this port performed somewhat better than the AVX-512 port on other Skylake(-X?)/-SP CPUs which only have 1 vector unit per core. I don't know what the status on Alder Lake is, which presumably has got one AVX-512 unit too (which is normally disabled by firmware).

In other words, the increased vector with of AVX-512 on its own was not useful to the LLR program (to the contrary), just the ability to get access to the portion of hardware which implements AVX-512 instructions but not AVX2/FMA3 instructions was helpful to LLR on Skylake-X/-SP and its derivatives.

¹) OK, I noticed that you already have results from several of the small "verification tasks" which AFAIK use the very same transform as the big "main tasks". Example: llr321_406451187c_0. This used the FMA3 transform, like Haswell and Zen would do.

On another note, power efficiency of LLR increases if SMT is not used. (Throughput of LLR may increase a little or decrease a little if SMT is not used; depends on the CPU and maybe OS and the particular PrimeGrid subproject.)

Edit:
Furthermore, i7-12700F's caches should be able to accommodate three simultaneous LLR-321 tasks without problem. *Maybe* four would still work OK too, but that would depend on how the combination of L2 caches, L3 cache, and memory subsystem work together in case of this particular workload. (I am assuming that the disabling of E cores does *not* disable some of the L3 cache slices.) However, since 8c/16t are not divisible by 3, my guess is that 2 simultaneous tasks would be the optimum for throughput on i7-12700F. Of course that would increase task durations a lot, compared to running just 1 task at a time.

Other potential options: 3 tasks with 5 threads each, on 15 of the 16 SMT threads of the P cores. Or 3 tasks with either 6 or 7 threads each, with P cores and E cores enabled, either under- or over-subscribing the total available 20 hardware threads a little. Finding out the real differences between such setups would require tests either with a lot of workunits, or better yet, with one and the same workunit repeated (launched outside of BOINC in a scripted testbed).
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,738
14,770
136
There is a bunch of avx512, not sure if this means it has it. The 300 watt was not sustained for more than 5 minutes or so. Not sure the timing.

mark@12700F-linux:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 39 bits physical, 48 bits virtual
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 151
Model name: 12th Gen Intel(R) Core(TM) i7-12700F
Stepping: 2
CPU MHz: 3133.345
CPU max MHz: 6300.0000
CPU min MHz: 800.0000
BogoMIPS: 4224.00
Virtualization: VT-x
L1d cache: 384 KiB
L1i cache: 256 KiB
L2 cache: 10 MiB
L3 cache: 25 MiB
NUMA node0 CPU(s): 0-15
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts
acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art ar
ch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_f
req pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pc
id sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand la
hf_lm abm 3dnowprefetch cpuid_fault epb cat_l2 invpcid_single cdp_l2 ssbd ibrs ibpb
stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust
bmi1 avx2 smep bmi2 erms invpcid rdt_a avx512f avx512dq rdseed adx smap avx512ifma
clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 x
saves split_lock_detect avx_vnni avx512_bf16 dtherm ida arat pln pts hwp hwp_notify
hwp_act_window hwp_epp hwp_pkg_req avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gf
ni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid movdiri movdir64
b fsrm avx512_vp2intersect md_clear serialize arch_lbr avx512_fp16 flush_l1d arch_ca
pabilities
mark@12700F-linux:~$
 

StefanR5R

Elite Member
Dec 10, 2016
5,682
8,240
136
BTW, to those who test performance of multithreaded applications within BOINC: If the client downloaded more than 1 task at a time per work request, the run times which will be shown on the project's web site for reported results will often be wrong.
 

StefanR5R

Elite Member
Dec 10, 2016
5,682
8,240
136
There is a bunch of avx512, not sure if this means it has it.
OK, evidently there is a certain support of AVX-512 enabled, yet the application does not make use of it for an as yet unknown reason. — Edit: See below for more recent results which actually did use AVX-512. — As I mentioned, it is possible that FMA3 works best for LLR on this hardware anyway.
L3 cache: 25 MiB
Good. Disabling the E cores does not reduce the last-level cache size.
 
Last edited:

biodoc

Diamond Member
Dec 29, 2005
6,270
2,238
136
I don't know what the status on Alder Lake is, which presumably has got one AVX-512 unit too (which is normally disabled by firmware).
Apparently some MB vendors have AVX-512 enabled on some of their early bios versions.

From Igor's Lab:

"Intel is now set to disable “AVX-512” completely on all Alder Lake CPUs with an upcoming microcode update in new BIOS releases."
 

mmonnin03

Senior member
Nov 7, 2006
221
222
116
Those are all of the individual AVX512 instructions, as not all are implemented on a CPU. Alderlake seems to have the most implementation, but only on the P cores.

The number of threads can be set in the project preferences or by app_config.xml. I check the PG site and thats the 1st project I've seen to allow for more than 8 threads in the drop down selection. Up to 256 at PG
 
Reactions: biodoc

biodoc

Diamond Member
Dec 29, 2005
6,270
2,238
136
So on some MB/bios combinations, you can disable the E-cores and enable AVX-512 on the P-cores.

 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,738
14,770
136
From all of this, it appears that since I did disable E-cores, i might be doing AVX-512 during those times I see 300 watt from the wall, correct ?
 

StefanR5R

Elite Member
Dec 10, 2016
5,682
8,240
136
I clicked through your current results list:
All of the small "proof tasks" were calculated with the FMA3 transform.
However, of the three currently visible results of "main tasks", two used the AVX-512 transform right away (llr321_406188011_0, llr321_406188013_0), and one started with the FMA3 transform, was suspended, resumed at 71.57 % progress, and switched to AVX-512 (llr321_406209826_0).

AVX-512 is often pictured as a power hog, but at the same time it can be very power efficient if the workload is suitable, in terms of energy expended for a task. Similar to how GPGPU computing can be a lot more efficient than CPU computing for some types of workloads.
 

StefanR5R

Elite Member
Dec 10, 2016
5,682
8,240
136
I check the PG site and thats the 1st project I've seen to allow for more than 8 threads in the drop down selection. Up to 256 at PG
I have seen the LLR application scale quite well to high thread counts on my own 22-core Broadwell-EPs. But when I once tested it on a virtual machine on Skylake-SP (must have been at least a 24-core SKU but could have been 28-core), LLR did hardly show more than 800 % CPU usage when I used more than 8 threads. It seemed to have hit a wall there. *Maybe* that's a drawback of the AVX-512 implementation which is used on Skylake-SP compared to the FMA3 implementation which is used on Broadwell-EP.

On Zen/ Zen 2/ Zen 3, multithreading does not scale as well to high thread counts as on Broadwell-EP because the former have multiple L3 caches, compared to the latter's big unified cache.

In any case, throughput of LLR is definitely the better the lower the thread count per program instance is, as long as you don't run too many program instances at once.

(BTW, Folding@home's FahCore_A7 scaled easily to 128 threads, if not more, and FahCore_A8 still works well with up to 64 threads. I suspect these were optimized for low inter-thread synchronization from the outset.)
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,738
14,770
136
First, avx-512 or not the 5950x is just blowing it away. Its doing usually 200 seconds, and the 5950x is doing 80 seconds. But what is this 9000 second proof task ?

132365069578236752615 Mar 2022 | 2:20:48 UTC15 Mar 2022 | 2:26:52 UTCCompleted and validated [Main task]
109.85​
1,249.45​
70.89​
321 (LLR) v9.01 (mt)
132365054478236743315 Mar 2022 | 2:20:48 UTC15 Mar 2022 | 2:25:00 UTCCompleted and validated [Main task]
105.74​
1,233.86​
70.88​
321 (LLR) v9.01 (mt)
132364975978236694715 Mar 2022 | 2:20:48 UTC15 Mar 2022 | 2:23:12 UTCCompleted and validated [Main task]
130.93​
1,393.59​
67.86​
321 (LLR) v9.01 (mt)
132361088078234556914 Mar 2022 | 23:16:40 UTC14 Mar 2022 | 23:51:14 UTCCompleted and validated [Main task]
65.60​
819.91​
70.84​
321 (LLR) v9.01 (mt)
132361012578234537514 Mar 2022 | 23:15:28 UTC14 Mar 2022 | 23:20:42 UTCCompleted and validated [Main task]
82.45​
964.05​
67.76​
321 (LLR) v9.01 (mt)
132360863178234481814 Mar 2022 | 23:15:28 UTC14 Mar 2022 | 23:38:53 UTCCompleted and validated [Main task]
84.23​
966.95​
67.86​
321 (LLR) v9.01 (mt)
132360863078234481714 Mar 2022 | 23:15:28 UTC14 Mar 2022 | 23:19:18 UTCCompleted and validated [Main task]
86.26​
994.97​
67.75​
321 (LLR) v9.01 (mt)
132360825378234460914 Mar 2022 | 23:15:28 UTC14 Mar 2022 | 23:42:24 UTCCompleted and validated [Main task]
70.56​
855.04​
70.88​
321 (LLR) v9.01 (mt)
132360782978234437814 Mar 2022 | 23:15:28 UTC14 Mar 2022 | 23:43:51 UTCCompleted and validated [Main task]
84.24​
974.24​
67.87​
321 (LLR) v9.01 (mt)
132360782878234437714 Mar 2022 | 23:15:28 UTC14 Mar 2022 | 23:36:19 UTCCompleted and validated [Main task]
70.23​
850.40​
70.87​
321 (LLR) v9.01 (mt)
132360674278121901214 Mar 2022 | 23:15:28 UTC15 Mar 2022 | 2:20:48 UTCCompleted and validated (1st) [Proof task]
8,950.38​
113,574.30​
8,678.08​
321 (LLR) v9.01 (mt)
132360633778234352914 Mar 2022 | 23:15:28 UTC14 Mar 2022 | 23:50:06 UTCCompleted and validated [Main task]
68.39​
835.42​
70.89​
321 (LLR) v9.01 (mt)
132360595978234330414 Mar 2022 | 23:15:28 UTC14 Mar 2022 | 23:40:03 UTCCompleted and validated [Main task]
68.53​
857.42​
70.87​
321 (LLR) v9.01 (mt)
132360501078234279014 Mar 2022 | 23:15:28 UTC14 Mar 2022 | 23:46:25 UTCCompleted and validated [Main task]
85.33​
971.79​
67.87​
321 (LLR) v9.01 (mt)
132360240978234126614 Mar 2022 | 23:15:28 UTC14 Mar 2022 | 23:27:21 UTCCompleted and validated [Main task]
81.34​
952.61​
67.81​
321 (LLR) v9.01 (mt)
132360203578234104714 Mar 2022 | 23:15:28 UTC14 Mar 2022 | 23:24:53 UTCCompleted and validated [Main task]
83.02​
966.61​
67.79​
321 (LLR) v9.01 (mt)
132360082378234036114 Mar 2022 | 23:15:28 UTC14 Mar 2022 | 23:37:28 UTCCompleted and validated [Main task]
69.43​
847.74​
70.87​
321 (LLR) v9.01 (mt)
132360025078234004614 Mar 2022 | 23:15:28 UTC14 Mar 2022 | 23:17:49 UTCCompleted and validated [Main task]
69.27​
849.61​
70.70​
321 (LLR) v9.01 (mt)
132359997878233989014 Mar 2022 | 23:15:28 UTC14 Mar 2022 | 23:32:46 UTCCompleted and validated [Main task]
79.64​
933.15​
67.82​
321 (LLR) v9.01 (mt)
132359923478233946614 Mar 2022 | 23:15:28 UTC14 Mar 2022 | 23:31:26 UTCCompleted and validated [Main task]
86.37​
982.89​
67.82​
321 (LLR) v9.01 (mt)
 

StefanR5R

Elite Member
Dec 10, 2016
5,682
8,240
136
All rows which contain a link labeled "[Main task]" are proof tasks.
All rows which contain a link labeled "[Proof task]" are main tasks.

Main tasks do the real work of testing for primality, and take hours. The PrimeGrid preferences web page says in the 321 Prime Search section: "Recent average CPU time: 51:31:02". That's across all recently contributing hosts. Note, CPU time ≠run time a.k.a. duration. The single main task in post #146, 1323606742, took 113,574.30 s CPU time = 31:32:54.

Proof tasks only verify whether a result from a main task is valid. It takes only a small fraction of the computation compared to its corresponding main task because of a clever algorithm which allows for this shortcut. One price which is to pay for this optimization is that the computer which worked on the main task had to upload a lot of intermediate data along with its result, and that the computer which is to work on the proof task has to download this lot of intermediate data as its input.

Also, a reminder to all who look at their own or other's results pages at the PrimeGrid web site:
If the client downloaded more than 1 task at a time per work request, the run times which will be shown on the project's web site for reported results will often be wrong.
In other words: If you don't know for sure whether or not a given computer was configured to request only a single task at a time, then ignore the run times listed on the results pages of this computer.

Edit for clarity:
– It's a boinc bug.
– It's not about how many tasks a computer has got in progress; the bug hits if the client receives more than 1 task in one work request.
– The bug affects multithreaded tasks.
The bug was kindly explained to me not so long ago but I already forgot the details and need to look it up.
 
Last edited:
Reactions: Ken g6

StefanR5R

Elite Member
Dec 10, 2016
5,682
8,240
136
All rows which contain a link labeled "[Main task]" are proof tasks.
All rows which contain a link labeled "[Proof task]" are main tasks.
Also: At a results page, you can click on "Show names" on the left in the table header.
All tasks named llr321_#########c_# are proof tasks. (c is for "check", I presume.)
All tasks named llr321_#########_# are main tasks.

The same goes of course for the names of the tasks which you have locally in progress, if you look at them with boincmgr, boinccmd, boinctui, or boinctasks.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |