Search results

K
[Bloomberg] Apple starting process to dump Intel in Macs

Depends on how important single-thread, general application performance is. They _could_ settle for "fast enough" single-thread performance (a souped up iPhone core), then add multi-threaded/simd performance in number cruncher models by ways of simple multi-core/simd/GPU/ML units (scaling...
- knutinh
- Post #80
- Apr 4, 2018
- Forum: CPUs and Overclocking
K
[Techreport] All Skylake-X i7 CPUs have two AVX-512 FMA's

Given that clock speed has stagnated and that most people are perfectly happy with 10year old PCs (or tablets) for updating their FB profile or writing Office documents, what are Intel to do? The biggest obstacle for SIMD usefulness is programmers bothering to use it - through libraries...
- knutinh
- Post #37
- Feb 20, 2018
- Forum: CPUs and Overclocking
K
[Techreport] All Skylake-X i7 CPUs have two AVX-512 FMA's

Some applications are compute bound, some bandwidth. Increasing the compute capability of a «well-rounded» cpu by 2x without increasing bandwidth will give better performance in some applications, while a larger percentage of applications will be bandwidth limited. I find it more interesting...
- knutinh
- Post #36
- Feb 20, 2018
- Forum: CPUs and Overclocking
K
[Techreport] All Skylake-X i7 CPUs have two AVX-512 FMA's

Large potential for something like Photoshop, sw-based video encoders. -k
- knutinh
- Post #17
- Feb 18, 2018
- Forum: CPUs and Overclocking
K
Compiler comparisions?

Anyone aware of somewhat sensible comparisions of compilers for eg HPC workloads for x86 and ARM? I am most interested in the speed of open source compilers (gcc) vs the cpu manufacturers compiler. -k
- knutinh
- Thread
- Nov 28, 2017
- Replies: 0
- Forum: CPUs and Overclocking
K
CPU archtecture getting rid of FPU

I think that you underestimate how much SIMD is used for number crunching. Either because the application programmer used assembler/intrinsics/a vectorizingncompiler (intel) or because they rely on some library (blas, fftw,...) that is vectorized. If we expect our hardware to do function «A» is...
- knutinh
- Post #26
- Oct 11, 2017
- Forum: CPUs and Overclocking
K
CPU for Floats crunching

I'd suggest that the Intel compiler is a pretty good reason to go with Intel hardware if the OP wants to write "clean" code and not mess with dirty optimization. I don't know much about Fortran, but I assume that the situation is similar to C. Also, no-one mentioned Xeon Phi? They are being...
- knutinh
- Post #34
- Sep 17, 2017
- Forum: CPUs and Overclocking
K
AMD EPYC Server Processor Thread - EPYC 7000 series specs and performance leaked

My experience with icc and gcc suggests that I would use icc if compiling numerically heavy code to run on Intel hardware that I had to pay for. The alternative with gcc might be to pepper the code with inline assembly in order to get vectorization. Bug-prone, non future-proof and resource...
- knutinh
- Post #273
- Jun 22, 2017
- Forum: CPUs and Overclocking
K
Will AMD support AVX-512 and Intel TSX ?

I believe that SIMD calculation amounts to a minute fraction of the cpu area, and a larger (but still small) fraction of the cpu power budget. I have seen energy breakdowns on fetching 64 bits from memory, doing a double-precision multi-acc, and storing the result back to memory. Turns out that...
- knutinh
- Post #63
- Jun 10, 2017
- Forum: CPUs and Overclocking
K
Will AMD support AVX-512 and Intel TSX ?

I would be curious to know what the potential would be for Adobe products (Photoshop, Lightroom) if they hired competent programmers/optimizers and targeted AVX512 + high core counts due to the iMac pro being used by "media professionals". Is "pixel processing" the bottle neck in those products...
- knutinh
- Post #40
- Jun 8, 2017
- Forum: CPUs and Overclocking
K
Ryzen's halved 256bit AVX2 throughput

So Intel has 2x the peak AVX FMA throughput of AMD. Even with a memory bandwidth of 2x, I would not necessarily expect a 2x speedup of even something like "professional rendering". Perhaps something really streamlined and FMA-centric like matrix multiply or convolution. For maximum performance...
- knutinh
- Post #75
- Mar 11, 2017
- Forum: CPUs and Overclocking
K
Apple adding ARM coprocessor to future Macs

So how much better performance:watt does a state of the art ARM core offer vs a state of the art x86 core, say at an operating point of 0.5W? My gut-feeling is that they should be quite similar, and that other factors are more relevant. Such as: 1. Does Apple like to have a credible bargaining...
- knutinh
- Post #14
- Feb 2, 2017
- Forum: CPUs and Overclocking
K
AVX2 and FMA3 in games

It was not at all clear to me. I have not seen much in the way of arguments from you, mostly normative claims? Please elaborate why a compiler manufacturer _must_ offer optimal performance on all platforms it supports, and how this relates to clearly not being the case for most products, be it...
- knutinh
- Post #97
- Jan 15, 2017
- Forum: CPUs and Overclocking
K
AVX2 and FMA3 in games

I admit that I am heavily biased towards problems that feature deep nested loops and that can execute really well on SIMD hw. Being able to write c code using icc, instead of having to resort to inline assy using gcc means being more productive, having less bugs and that your code can be...
- knutinh
- Post #87
- Jan 14, 2017
- Forum: CPUs and Overclocking
K
AVX2 and FMA3 in games

You said: "They don't optimize for specific CPUs, except in the cases of bugs" From your own link: "the compiler or library can make multiple versions of a piece of code, each optimized for a certain processor and instruction set," -k
- knutinh
- Post #84
- Jan 14, 2017
- Forum: CPUs and Overclocking
K
AVX2 and FMA3 in games

Optimal assy is always going to be as fast or faster than intrinsics. The same relationship between intrinsics and code peppered with pragmas etc. The higher up the abstraction ladder, the more opportunities are off limits, and (best case) speed can only get worse. Now, writing optimal...
- knutinh
- Post #82
- Jan 14, 2017
- Forum: CPUs and Overclocking
K
AVX2 and FMA3 in games

"Needs to" in what sense? Legally? Morally? Market-wise? I disagree. People (even Intel) gets to make compilers. They get to target whatever cpu they like. If they choose to not spend any time optimizing for competing hw manufacturers that is fine. I believe that AMD have used ICC in PR...
- knutinh
- Post #81
- Jan 14, 2017
- Forum: CPUs and Overclocking
K
AVX2 and FMA3 in games

How can you be so confident? The Atom line of processors supports a given instruction set that may be similar to the big guys. But due to a lack or re-ordering, cache size etc, "optimal" code might be quite different. I think that Intel does whatever their resources and ingenouity allows them...
- knutinh
- Post #80
- Jan 14, 2017
- Forum: CPUs and Overclocking
K
AVX2 and FMA3 in games

I would assume that a cpu identification is carried out when the binary is executed, and the result is kept in some state until it is done. https://computing.llnl.gov/?set=code&page=intel_vector
- knutinh
- Post #66
- Jan 12, 2017
- Forum: CPUs and Overclocking
K
AVX2 and FMA3 in games

ICC is integrated with Visual Studio, so that your project is still MS, but parts of it will just run faster. Now, ICC costs money and equipping each project member with that in order to build your code is cumbersome. Setting up and comprehending compilers is unpleasant. My guess is that games...
- knutinh
- Post #64
- Jan 12, 2017
- Forum: CPUs and Overclocking
K
AVX2 and FMA3 in games

My recollection is that you write your code once, tell ICC what set of targets you want to optimize for, and it will generate a binary that automatically choose the right code for you. -k
- knutinh
- Post #63
- Jan 12, 2017
- Forum: CPUs and Overclocking
K
AVX2 and FMA3 in games

If the compiler does this automatically, it is not that much more hassle. You need to get a decent compiler and set it up but that is pretty much a given if you want performance anyway. A different twist is offered by the FFTW library. Say that you want to run FFTs a million times a second for...
- knutinh
- Post #57
- Jan 12, 2017
- Forum: CPUs and Overclocking
K
AVX2 and FMA3 in games

Is that not how ARMs scalable vector extensions are designed? Write for a 2048-bit hypothetical target, get the execution of whatever the hw is capable of: https://www.community.arm.com/processors/b/blog/posts/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture...
- knutinh
- Post #56
- Jan 12, 2017
- Forum: CPUs and Overclocking
K
AVX2 and FMA3 in games

Agreed. But that is merely a question of technical convenience. Do you explicitly detect hw and code different paths? Do you rely on ICC to do everything for you? -k
- knutinh
- Post #55
- Jan 12, 2017
- Forum: CPUs and Overclocking
K
AVX2 and FMA3 in games

I think that Photoshop/image processing should be an excellent candidate for applications that matter for a reasonable number of users (i.e. quite a lot of people own it, many would like it to be faster). x264. Encryption. Machine learning. I think it is more interesting to list the...
- knutinh
- Post #49
- Jan 12, 2017
- Forum: CPUs and Overclocking
K
AVX2 and FMA3 in games

As ShintaiDK said, it is possible to distribute binaries that follow different code-paths depending on hardware. I think that makes a lot of sense. Now, should there be 2 or 10 code paths, what is the "sweet spot"? Would users accept that their game binary download is 2 GB instead of 512MB only...
- knutinh
- Post #47
- Jan 12, 2017
- Forum: CPUs and Overclocking
K
AVX2 and FMA3 in games

Either ICC or assembly (typically open-source projects). Or pushing high-complexity work into 3rd party libraries that may have been compiled this way or another. -k
- knutinh
- Post #46
- Jan 12, 2017
- Forum: CPUs and Overclocking
K
AVX2 and FMA3 in games

Having twice the vector width (AVX vs SSE or AVX512 vs AVX), given that the actual hw resources behind the scene scales, should more than offset the slight reduction in clock that Intel more or less automatically now use for AVX code, given that the problem solved by the code maps well to wide...
- knutinh
- Post #45
- Jan 12, 2017
- Forum: CPUs and Overclocking
K
Microsoft Windows 10 on ARM

Quirks like that are allready difficult on one platform (e.g. Windows on x86 using a single compiler). Even switching to Linux (using the same hardware), default memory philosophy could expose nasty assumptions made by the programmers on the basis of "works for me". An emulator that lacks the...
- knutinh
- Post #27
- Dec 9, 2016
- Forum: CPUs and Overclocking
K
When will the CPU as we know it "die"?

Also, I hear that 640KB will be enough. -k
- knutinh
- Post #8
- Nov 4, 2015
- Forum: CPUs and Overclocking
K
AT's iPhone 6S review is up

I would be interested in: 1. Simple low-level tests. How many float32 adds or mults or multiply-accumulate or divs can be carried out per second when data is hot in the cache. Code should be hand-optimized for the architecture. 2. Representative cache/memory efficiency tests. I don't know what...
- knutinh
- Post #67
- Nov 4, 2015
- Forum: CPUs and Overclocking
K
Intel Skylake / Kaby Lake

Having a "unified" instruction set would benefit purchasers of the expensive units as well. More software would be optimized for the fancy instructions, and performance would probably be better on a high end cpu for generic software. This all assumes that Intel can somehow...
- knutinh
- Post #5,347
- Oct 27, 2015
- Forum: CPUs and Overclocking
K
[Q] ARM vs x86 in consumer space in 10 years

I think that Microsoft/Apple/Google are concentrating on higher-level languages/libraries, and that the bulk of "Apps" (as seen by consumers at large) have transitioned from projects where much of the complexity lies in low-level things (i.e. printer drivers, extended memory management), to...
- knutinh
- Post #58
- Oct 26, 2015
- Forum: CPUs and Overclocking
K
Intel Skylake / Kaby Lake

I have two computers: 1. An office computer used (among other things) for Adobe Lightroom. It is an Intel i7 2600 w/12GB of DDR3 and 120GB SSD. 2. A living room computer/HTPC with a core2 duo and 2GB of ram, spinning drive. Both running Windows 7 64. I have thought about updating both...
- knutinh
- Post #5,078
- Oct 9, 2015
- Forum: CPUs and Overclocking
K
Apple A9X the new mobile SoC king

The more I learn about software optimization, the more sceptical I am about cpu benchmarks. Usually, you are testing a particular software implementation, a compiler and some piece of hardware jointly. Trying to compare two pieces of hardware this way is hard. My experience is that software...
- knutinh
- Post #180
- Sep 14, 2015
- Forum: CPUs and Overclocking
K
Does anyone even know what MASM is anymore ?

I think that your thought is interesting. If my hardware has an overall efficiency boost of 10%, then I expect it to apply to all of my applications (on average). If any one of my applications have a 10% speedup, then that will be only for this single application. Thus, while it might be worth...
- knutinh
- Post #103
- May 31, 2015
- Forum: CPUs and Overclocking
K
Does anyone even know what MASM is anymore ?

I fully agree that for 99% of the applications, 99% of the time, ASM is not the solution. Lots of pain, lots of bugs, and the overall speedup may not be "enough". The fact that these see performance improvements tells us nothing about how much ASM would have mattered? If some JavaScript app...
- knutinh
- Post #102
- May 31, 2015
- Forum: CPUs and Overclocking
K
Does anyone even know what MASM is anymore ?

Then you are argueing against straw men. I said that _optimal_ ASM will be at least as fast as compiled code. That fact is evident: ASM is a strict superset of compiled C code, it occupies a larger "space". Whatever a compiler does with C code, a team of monkeys and lots of time could (in...
- knutinh
- Post #101
- May 31, 2015
- Forum: CPUs and Overclocking
K
Does anyone even know what MASM is anymore ?

I see that compilers mentions all kinds of features, still it is possible to beat them in some cases using simple inline asm. Thus the compiler will not always and cannot always out perform a dedicated programmer. Sure, realistically, one cannot expect "optimal" assembler code ala "guaranteed...
- knutinh
- Post #93
- May 29, 2015
- Forum: CPUs and Overclocking
K
Does anyone even know what MASM is anymore ?

I have experienced that gcc chose to pepper my intrinsics with useless memory transfers back and forth. Simply stripping away the nonsense and reusing the output as inline assembly caused a significant speedup. How is that not a point in favour of ASM? Well, yeah, but I have a harder time...
- knutinh
- Post #92
- May 29, 2015
- Forum: CPUs and Overclocking

RESOURCES

Top Bottom