AVX question

Anarchist420

Diamond Member
Feb 13, 2010
8,645
0
76
www.facebook.com
Are CPUs with the AVX instruction set capable of double floating point precision or just single? If they're capable of double floating point precision, then what's performance penalty compared to FP32 precision, if any?
 

Abwx

Lifer
Apr 2, 2011
11,536
4,323
136
Double precision and also double the throughput compared to
regular SSE2 , albeit it s unlikely that a soft can use only AVX.
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
I thought all CPUs were measured (FLOPS) in doubled percision.
 

mustard010

Member
Sep 13, 2003
93
0
0
Are CPUs with the AVX instruction set capable of double floating point precision or just single? If they're capable of double floating point precision, then what's performance penalty compared to FP32 precision, if any?

AVX Instructions have a 256-bit (32 byte) wide-registers with a total of 16 registers. AVX instructions are capable of performing either single precision ordouble precision depending on the application. With the 32 byte width, one can place up to 8 floats or 4 doubles in one register.

You can then issue an instruction that can do calculations on the 8 floats, or a calculation that performs calculations on 4 doubles. This is known as vectorization (e.g: single instruction multiple data)

Generally, single precision is faster than their double counterparts. I'm sure someone else who is versed on the subject can explain better... From my own perspective, more bits == more work. Obviously, there is more bits of precision in the IEEE double format, so you would generally get better precision using doubles. Note though, inherent rounding in floating point is inevitable.
 
Last edited:

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
Heh, I would've guessed SP so you get a higher number.

Single percision is very limited. It works OK for GPUs because their role is also limited. Most of the new GPUs that are being used for other tasks like Tesla, have increased double percision performance.

CPUs need to be well rounded to handle all tasks. So measuring CPUs in SP would be a very useless number in my opinion. I could be wrong however.
 

mustard010

Member
Sep 13, 2003
93
0
0
Single percision is very limited. It works OK for GPUs because their role is also limited. Most of the new GPUs that are being used for other tasks like Tesla, have increased double percision performance.

CPUs need to be well rounded to handle all tasks. So measuring CPUs in SP would be a very useless number in my opinion. I could be wrong however.

They have two different FLOPS counts, one for sp, and another for dp. It's not really an apples-to-apples comparison when one FLOPS counts has 32 bits more to deal with.
 

TuxDave

Lifer
Oct 8, 2002
10,571
3
71
Single percision is very limited. It works OK for GPUs because their role is also limited. Most of the new GPUs that are being used for other tasks like Tesla, have increased double percision performance.

CPUs need to be well rounded to handle all tasks. So measuring CPUs in SP would be a very useless number in my opinion. I could be wrong however.

I was mostly referring to what goes on marketing slides.
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
They have two different FLOPS counts, one for sp, and another for dp. It's not really an apples-to-apples comparison when one FLOPS counts has 32 bits more to deal with.

I realize that. And we see that all the time with GPUs. But rarely, if ever, do I see different numbers released for CPUs.
 

mustard010

Member
Sep 13, 2003
93
0
0
I realize that. And we see that all the time with GPUs. But rarely, if ever, do I see different numbers released for CPUs.

Probably because traditional CPUs are slow w.r.t. GFLOP ratings. GPUs were designed with many stamped out vector processes, right? I wouldn't be surprised if CPUs start to advertise SP and DP in the near future with the fused CPU/GPU architectures.
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
I wouldn't be surprised if CPUs start to advertise SP and DP in the near future with the fused CPU/GPU architectures.

Perhaps. I just find the SP number as not important on the CPU side. Hell, even some DC applications (milkyway@home for example) no longer work on SP only GPUs.
 

TuxDave

Lifer
Oct 8, 2002
10,571
3
71
I realize that. And we see that all the time with GPUs. But rarely, if ever, do I see different numbers released for CPUs.

Well if you can find a slide that posts the peak theoretical flops, we can easily do the math to figure out if it's SP or DP.
 

TuxDave

Lifer
Oct 8, 2002
10,571
3
71
AVX Instructions have a 256-bit (32 byte) wide-registers with a total of 16 registers. AVX instructions are capable of performing either single precision ordouble precision depending on the application. With the 32 byte width, one can place up to 8 floats or 4 doubles in one register.

You can then issue an instruction that can do calculations on the 8 floats, or a calculation that performs calculations on 4 doubles. This is known as vectorization (e.g: single instruction multiple data)

Generally, single precision is faster than their double counterparts. I'm sure someone else who is versed on the subject can explain better... From my own perspective, more bits == more work. Obviously, there is more bits of precision in the IEEE double format, so you would generally get better precision using doubles. Note though, inherent rounding in floating point is inevitable.

From what I see, SP and DP instructions are typically the same latency (with the exception of SQRT and DIV). It's just that you have more packed SP vectors than DP and so SP throughput SHOULD be twice as high.
 

mustard010

Member
Sep 13, 2003
93
0
0
From what I see, SP and DP instructions are typically the same latency (with the exception of SQRT and DIV). It's just that you have more packed SP vectors than DP and so SP throughput SHOULD be twice as high.

Not always twice the performance between SP and DP, no? I originally thought that this was the case, but looking at NVIDIA GPUs, DP is very suboptimal compared to SP. Not just half the performance.... maybe in the order of a fifth
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,361
136
Are CPUs with the AVX instruction set capable of double floating point precision or just single?

All current CPUs from AMD and Intel are 64bit, that means double floating 64FP. That is in Legacy mode.

When we have SIMD instructions (SSE) we can have 128bit which is 2x64 or 4x32 etc.

AVX256 instructions are 256bit, that means that the FPU can execute 4x 64bit or 8x32bit etc.


If they're capable of double floating point precision, then what's performance penalty compared to FP32 precision, if any?

Calculating in 64bit outputs a higher precision number than 32bit but it takes longer to calculate the same instruction.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,786
136
In CPUs, Single Precision values are 2x Double Precision values.

In the case of GPUs, Tesla GPUs achieve the same, 2x SP = DP.
 

TuxDave

Lifer
Oct 8, 2002
10,571
3
71
Not always twice the performance between SP and DP, no? I originally thought that this was the case, but looking at NVIDIA GPUs, DP is very suboptimal compared to SP. Not just half the performance.... maybe in the order of a fifth

Just looking at the numbers, an FP add will take the same # of cycles whether it's 8 SP values or 4 DP values. Any additional performance deltas from that is due to secondary effects, for example packing and unpacking a series of unpacked FP numbers prior to the FP add. So you may be correct that the performance delta between SP and DP is more than a 5x, but it would be due to secondary effects (loads, stores, instruction flows) and not on the basic level of how long it takes to do an ADD or MUL
 

mustard010

Member
Sep 13, 2003
93
0
0
Just looking at the numbers, an FP add will take the same # of cycles whether it's 8 SP values or 4 DP values. Any additional performance deltas from that is due to secondary effects, for example packing and unpacking a series of unpacked FP numbers prior to the FP add. So you may be correct that the performance delta between SP and DP is more than a 5x, but it would be due to secondary effects (loads, stores, instruction flows) and not on the basic level of how long it takes to do an ADD or MUL

Thanks for clarifying. After posting an earlier post, I immediately checked the Tesla SP vs. DP performance, and lo and behold SP was indeed 2x as fast as DP. Then again, this is just raw FLOPS and doesn't take into account memory movement as you say.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |