Question CPUs for shared memory parallel computing

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Hermetian

Member
Sep 1, 2024
74
57
46
frostconcepts.org
The speed-up has two components. 1/3 of it (in runtime) is due to the built-in SequenceAlignment function. 2/3 of it is due to the way I incorporated it. In particular, I eliminated all control structures and expressed the computation as a composite function (math terminology). This permits the compiler to sequence the computation on the processor stack, affording dramatic speedups.
 
Reactions: igor_kavinski

Mahboi

Senior member
Apr 4, 2024
990
1,786
96
Thank you for thinking about my computing environment.

One of my areas of study are the DNA chromosomes of perennial plants. My idea of a medium size chromosome is 20 to 30 MB -- literally a string of that length composed of the letters A,T,G,C. Per chromosome and per "marker", I need to perform 70 to 210 regular expression searches for substrings of size 16 to 512 letters, each of which will produce many coordinates of matches, all of which are then annotated and written to disk per search for post-processing. All of this is regular expression dependent and thus happening asynchronously.

For this application on a 30 MB chromosome and a single "marker", I clock the shortest total run time by paralyzing on the searches across 9 "kernels" on my 10 core CPU. For example, a total of 72 searches would be calculated by 8 iterations per kernel (a Mathematica dispatched process). The memory utilization for the first few minutes is about 80% of my 16 GB (according to Task Manager, incl. the OS) and then drops to about 70% for an hour or so, and then drops further as each kernel finishes its task. I estimate the OS and Mathematica memory overhead at about 17%.
Possible 3D Vcache workload?
30 Mo is about the cache of a standard CPU, but if you're running multiple regexes it will definitely go way above that...
I'd be curious to see a perf differential between a 7950x and 7950x3D.
 
Jul 27, 2020
19,613
13,477
146
Interestingly, I have a i7-5775C to see what 128MB of eDRAM cache would do in such workloads. Wish OP would cook something up quick in Python for me to benchmark
 

Mahboi

Senior member
Apr 4, 2024
990
1,786
96
Following a paradigm doesn't make it not OOP.
All Python code is extremely OOP and following func paradigms won't do a thing. Python will still create a large amount of objects that you don't see coming or can't really control.

A great example of a language that's OOP but manages to offer solid func paradigms is D. Does it flawlessly IMO.
 
Jul 27, 2020
19,613
13,477
146
I know the specs, I'm curious about live testing a 3DV$ CPU on your kind of workload.
On an Epyc, I think it would be possible to create a CCD-aware application that uses all eight threads on one CCD and then uses the other CCD threads to preload data and request it much faster over the fabric when needed rather than going out to system RAM. Correct?
 

Hermetian

Member
Sep 1, 2024
74
57
46
frostconcepts.org
On an Epyc, I think it would be possible to create a CCD-aware application that uses all eight threads on one CCD and then uses the other CCD threads to preload data and request it much faster over the fabric when needed rather than going out to system RAM. Correct?
I'll wager it could be done, likely involving assembly language. However, I'm not facing the bottleneck addressed by that approach.
 
Reactions: igor_kavinski

LightningZ71

Golden Member
Mar 10, 2017
1,782
2,135
136
On an Epyc, I think it would be possible to create a CCD-aware application that uses all eight threads on one CCD and then uses the other CCD threads to preload data and request it much faster over the fabric when needed rather than going out to system RAM. Correct?
Very situational. Remember, bandwidth is still limited by the IF link to the IOD. EPYC has more than enough main memory bandwidth to aturate any individual IF link. You might gain a tiny but of first word latency from another CCD, but for any large dataset, it won't be useful.
 

Mahboi

Senior member
Apr 4, 2024
990
1,786
96
Sorry I can't help you there.
Of course you can.

Buy those (it's a 32Go 12 core modern system). The CPU has 6 cores with a 96Mo of extended L3 cache (Vertical cache) and the other 6 cores have merely 32Mo of L3.
It'd be veeery interesting to see the differential between the two.
 

Mahboi

Senior member
Apr 4, 2024
990
1,786
96

Incidentally in France we've reached peak comedy from performance paranoia: an 8 V$ core is now 60 cents away from a 6 core V$ with another 6 core strapped to it.
50% more cores and only mild drawbacks at having (and you don't really HAVE to either) to deal with inter CCD latency or occasionally set an app to "prefer cache" to shove it onto the correct CCD.
Jeebus.
 

Mahboi

Senior member
Apr 4, 2024
990
1,786
96
That's 20% of the cost of a warrantied system complete with liquid cooling. Then I'd need to purchase a CPU-locked Wolfram license to use it. Overall, a very poor use of my resources.
Liquid cooling?
For a 120W CPU?
If you want corporate warranty, I don't really see why you came to ask us frankly, pretty much everyone here is a tinkerer, it's going to just be about what HP or Dell or whoever wants to sell, isn't it? Not so much a question of choice but of offering.

Ah, the eternal CPU locked licenses, good luck with that.
 

Hermetian

Member
Sep 1, 2024
74
57
46
frostconcepts.org
Liquid cooling?
For a 120W CPU?
You missed this in prior discussion:
On CPU with N cores, I will launch N-1 processes - each running a copy of my computation on a different segment of the data. This results in 70% to 80% CPU utilization. I also use advanced Bios settings, including overclock.
If you want corporate warranty, I don't really see why you came to ask us
Please check prior posts for answers.
HP or Dell
LOL 😂
 
Jul 27, 2020
19,613
13,477
146
I think the problem is Mathematica. I've read a bit about the guy (he's a rare genius) and Wolfram Alpha was pretty impressive back in the day when I tried it. However, I believe he thinks that the world doesn't appreciate him or his contributions enough so he's resorted to overcharging for his abilities and products. This would be enough for me to never touch Mathematica because I dislike any company/individual engaged in price gouging.

In the For Sale forum section, there's a lovely 5950X with decent mobo and 64GB RAM on sale for just $550. I don't need it coz I already have the Epyc. But this system would've been a super cheap powerhouse for the OP, had he not been limited by Mathematica's horrible license. Sorry. I think it's too restrictive a license.
 
Reactions: Mahboi

DokiDoki

Member
Aug 21, 2024
47
77
46
I think the problem is Mathematica. I've read a bit about the guy (he's a rare genius) and Wolfram Alpha was pretty impressive back in the day when I tried it. However, I believe he thinks that the world doesn't appreciate him or his contributions enough so he's resorted to overcharging for his abilities and products. This would be enough for me to never touch Mathematica because I dislike any company/individual engaged in price gouging.
You need to separate Stephen Wolfram's views on reformulating physical laws using cellular automata from the actual software his company makes.

@Hermetian In one of your previous posts, you mentioned that ideally you would like to have a #of core:#of memory channel ratio of 1, right?

Lower end Xeon W 3500s based on sapphire rapids have models with that ratio of 2.

They boost to 4.8 GHz, but have a measly L3 cache.

The product you wish for may exist when Intel releases workstation Xeons based on upcoming architectures.
 
Reactions: Hermetian

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,054
15,198
136
You need to separate Stephen Wolfram's views on reformulating physical laws using cellular automata from the actual software his company makes.

@Hermetian In one of your previous posts, you mentioned that ideally you would like to have a #of core:#of memory channel ratio of 1, right?

Lower end Xeon W 3500s based on sapphire rapids have models with that ratio of 2.

They boost to 4.8 GHz, but have a measly L3 cache.

The product you wish for may exist when Intel releases workstation Xeons based on upcoming architectures.
It exists now in EPYC, but he has already said that is a bad solution. AND they have great L3 caches. Any Xeon would fade compared to EPYC.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,054
15,198
136
Last edited:

Hermetian

Member
Sep 1, 2024
74
57
46
frostconcepts.org
@Markfw
You are getting carried away with superlatives. Here is what I actually said:

Two things are very clear about my situation, which I've previously stated in this discussion thread: (1) I will have to upgrade my hardware to tackle a larger dataset which is on the back burner for now (2) There's no currently available hardware in my price range that would significantly improve the runtime of that dataset.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |