These days, a typical design will have the L2 shared between 2 or more CPU cores. Each core has their own L1 caches and a bus out to a shared L2. Physically, everything is smushed together on the die but conceptually since the L2 works with 2 or more cores it isn't counted as totally part of...
I'm pretty sure that bug has been fixed for years. And I certainly wasn't seeing the problem in vectorized C code, at least in the example in this thread.
I didn't need to worry about any of these to get the C code to within 2x of the ASM code.
And are you seriously using half-precision FP...
Unless your assembler can automatically do inlining, loop unrolling, function specialization, whole program optimizations and all of the other tricks modern compilers do this is not going to generally be the case. Sure, I guess you can do this by hand but are you going to manually re-do e.g...
The penalties vary depending on the chip you're running on. The compiler might not handle it well, but neither does the ASM - there's only a single code path rather than one ASM function tuned for each microarchitecture. Not sure how this is a point in favor of the ASM code.
-Ofast is pretty...
I just repeated that same test using a compiler released in this decade. The compiled code is now about 2x slower than pure ASM rather than 30x. Still a win for the ASM code I guess, if you exclude development time, the fact it doesn't actually do the same thing as the C code, automatically...
Sure, the same one I asked in the post you responded to - where can I find reviews of the performance of NV cards running at their guaranteed base clocks. All I can find anywhere are reviews of them running at max boost clocks, just like all the reviews of AMD cards.
In other words, guaranteed...
That's great. Where can I find reviews of Kepler GPUs running at their guaranteed advertised base clock rate? Everything I've seen has them running at their max variable boost clock speed, just like AMD cards.
+/-3% or so, based on the the review sites I've seen.
How does this compare against similar nV cards? We don't know, because nV only paid to have AMD cards tested by TechReport - and presumably the other review sites as well.
Nope, for Titan you'd only run it in faster DP mode if the gain in DP performance outweighs the slowdown from clock throttling :
"The penalty for enabling full speed FP64 mode is that NVIDIA has to reduce clockspeeds to keep everything within spec. For our sample card this manifests itself as...
Compare this approach to the one other reviewers are saying Nvidia used to try and sway reviewers :
http://forums.anandtech.com/showpost.php?p=35714780&postcount=54
You'd have to admit that's quite a coincidence.
Nah, lots of it is coming directly from NV.
" When discussing its new product with us, Nvidia took some time to explain how the 780 Ti differs from the competition. Some of what they offered in this context was FUD about the variable performance of the 290X cards in the market."
"There's...
Where can I buy a 290 which defaults to a 34% fan limit, as tested here by Anandtech - http://www.anandtech.com/show/7481/the-amd-radeon-r9-290-review/15?
If there's now an "only test at defaults" policy it hasn't been around for that long.
This is test where the 770 averaged 33FPS, which is 33.3msec per frame on average. You'd expect that with the cutoff right at the average frame time that a number of frames would be over the average by some amount no matter how good the frame pacing was. Without knowing how many frames were...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.