The Stilt
Golden Member
- Dec 5, 2015
- 1,709
- 3,057
- 106
Ryzen bandwidth is great.
Yeah the bandwidth is fine, however why do you think Ryzen does well in AVX?
Only with 128-bit ops it does ok.
Ryzen bandwidth is great.
Yeah the bandwidth is fine, however why do you think Ryzen does well in AVX?
Only with 128-bit ops it does ok.
Yeah the bandwidth is fine, however why do you think Ryzen does well in AVX?
Only with 128-bit ops it does ok.
The idea behind Ryzen is that it has more cores to make up for lost 256bit performance. It's down on Coffeelake this year, but if Ryzen 3k is at max a 12 or more core CPU, then it balances out. It ends up being better at 128bit, ties on 256bit, and then loses completely due to lack of support on 512. I am guess Zen 3 will probably move to 256bit units and do 512 the way it's doing 256 right now.
I don't think that is idea.
- Core with AVX512 will take around 20-40% more space than a core that can do AVX256.
- With AVX 512-bit you can double performance. So basically 40% more space, but double throughput.
I think idea is using AVX perf/power efficiently.
The idea behind Ryzen is that it has more cores to make up for lost 256bit performance. It's down on Coffeelake this year, but if Ryzen 3k is at max a 12 or more core CPU, then it balances out. It ends up being better at 128bit, ties on 256bit, and then loses completely due to lack of support on 512. I am guess Zen 3 will probably move to 256bit units and do 512 the way it's doing 256 right now.
You can go wider without going straight to AVX-512 Native support. No reason to waste the real-estate, power requirements, and overall uselessness of it just to say you have it.Not all workloads can be parallelized infinitely, if at all.
Modern video encoders such as VP9 and HEVC are a good example.
With HEVC you can only utilize ~10 threads efficiently at 1080 resolution. To efficiently utilize more threads than that, the resolution needs to go up.
That's why it is important to maximise the ILP, which is achieved using >= 256-bit code.
In my opinion, AMD HAS TO make Gen. 3 Zen wider.
Intel went wide with their consumer cores already in 2013 after all.
And its not like the wide workloads are getting any less common.
Ryzen7 1800X - 3600 Mhz, 8 cores, 2-way SMT, 128-bit registers == 3600000 * 8 * 2 * 4 == 230400000 32-bit IOPS
Core i7-7700K - 4200 Mhz, 4 cores, 2-way SMT, 256-bit registers == 4200000 * 4 * 2 * 8 == 268800000 32-bit IOPS
Core i7-7740X - 4300 Mhz, 4 cores, 2-way SMT, 256-bit registers == 4300000 * 4 * 2 * 8 == 275200000 32-bit IOPS
Core i7-7800X - 3500 Mhz, 6 cores, 2-way SMT, 256-bit registers == 3500000 * 6 * 2 * 8 == 336000000 32-bit IOPS
Core i7-5960X - 3000 Mhz, 8 cores, 2-way SMT, 256-bit registers == 3000000 * 8 * 2 * 8 == 384000000 32-bit IOPS
Core i7-6900K - 3200 Mhz, 8 cores, 2-way SMT, 256-bit registers == 3200000 * 8 * 2 * 8 == 409600000 32-bit IOPS
Core i7-7820X - 3600 Mhz, 8 cores, 2-way SMT, 256-bit registers == 3600000 * 8 * 2 * 8 == 460800000 32-bit IOPS
Now you look at this link , therefor Ryzen doesn't have a dedicated AVX2 hardware just AVX2 emulation.Just like what Stilt said , AMD really need a Fully 256 bit register not two 128-bit.8 threads * 8-way SIMD = 64 32-bit ops in parallel
16 threads * 4-way SIMD = 64 32-bit ops in parallel
32 threads * 4-way SIMD = 128 32-bit ops in parallel
24 threads * 8-way SIMD = 192 32-bit ops in parallel
That's why 256-bit registers are better than more cores. And leveraging AVX2 is trivial, the OpenCL compiler does it for you even.
Yeah, that what I said didn't I?
I am guessing Zen 3 will probably move to 256bit units and do 512 the way it's doing 256 right now.
I don't think that is idea.
- Core with AVX512 will take around 20-40% more space than a core that can do AVX256.
- With AVX 512-bit you can double performance. So basically 40% more space, but double throughput.
- Why would you need more than 8 cores on desktop?
- 12Core vs 8Core with 15% higher IPC and 20% higher clocks? Which one would you pick?
I think idea is maximizing efficiency on all three scenarios power/die/performance.
I think AVX should be accelerated by iGPU(GPU)
Not all workloads can be parallelized infinitely, if at all.
Modern video encoders such as VP9 and HEVC are a good example.
With HEVC you can only utilize ~10 threads efficiently at 1080 resolution. To efficiently utilize more threads than that, the resolution needs to go up.
That's why it is important to maximise the ILP, which is achieved using >= 256-bit code.
In my opinion, AMD HAS TO make Gen. 3 Zen wider.
Intel went wide with their consumer cores already in 2013 after all.
And its not like the wide workloads are getting any less common.
January 2017 -redacted- joined back to AMD with this: "...microachitectural designs of floating point register caching, per-lane predication, scatter/gather support and wide 512-bit datapath."Whats your guess here?
- https://www.hpcwire.com/2017/06/29/reinders-avx-512-may-hidden-gem-intel-xeon-scalable-processors/The VL extension enables AVX-512 instructions to operate on XMM (128-bit) and YMM (256-bit) registers, and are not limited to just the full ZMM registers. This symmetry definitely is good news. AVX-512, with the VL extension, seems well set to be the programming option of choice for compilers and hand coders because it unifies so many capabilities together along with access to 32 vector registers regardless of their size (XMM, YMM or ZMM).
I didn't say anything about going 512-bit wide, but wider.
256-bit is definitely the way to go, at least for the few years to come.
How much more power will they need for that. We know that Intel node is still superior. AMD will have hard time keeping power down at same performance as intel.
Looking forward to CFL-X. /shttps://wccftech.com/intel-kaby-lake-x-cpus-eol-discontinued/
Intel Discontinues Kaby Lake-X Quad Cores