Search results

  1. B

    Intel extends AVX to 512-bit

    We have 10-core processors right now. You have the (lack of) competition to blame for keeping the prices high on anything beyond quad-core. AMD's Steamroller architecture with four modules might finally perform a little closer to an 8-core. So that would make Intel release affordable 6 or 8-core...
  2. B

    Intel extends AVX to 512-bit

    They tried but failed, due to the inherent heterogeneous overhead and programming complications. With the Kepler architecture they're focusing on graphics again, which is where the money is for them, and they've taken a serious step back from consumer GPGPU. AVX-512 instead is homogeneous, which...
  3. B

    Intel extends AVX to 512-bit

    What makes you think that? Same question. I don't perceive the lack of dedicated masking registers as that big of an issue. AVX has 'blend' instructions for predication, and Intel CPUs have two execution ports for them. Masking can reduce power consumption by disabling unused lanes, and it...
  4. B

    Intel extends AVX to 512-bit

    That's a bit like asking what programs benefit from having more execution units per core. Sure, some have more instruction level parallelism (ILP) than others, but you can't draw a line between ones that do and ones that don't benefit from it. Likewise, these wide vector instructions are very...
  5. B

    Intel extends AVX to 512-bit

    Interestingly it's not exactly new. It's for the most part the Xeon Phi ISA, made compatible with the legacy 256-bit and 128-bit instructions. There's a new zeroing behavior option when using the mask registers, which seems of particular interest to out-of-order execution architectures, to...
  6. B

    Intel extends AVX to 512-bit

    AVX only extended floating-point operations to 256-bit. x264 uses integer operations. AVX2 offers 256-bit integer vector operations. AVX-512 does not extend them to 512-bit, for now. Just to be clear though, GPGPU is not efficient at processing small integer elements.
  7. B

    Intel extends AVX to 512-bit

    It's going to revolutionize computing as we know it. It brings all of the general-purpose computing power of the GPU, into the CPU cores. No more heterogeneous overhead. R.I.P. GPGPU.
  8. B

    Intel extends AVX to 512-bit

    http://software.intel.com/en-us/blogs/2013/07/10/avx-512-instructions
  9. B

    Company of Heroes 2 - fascinating CPU benchmarks

    With more threads, you get more interactions between threads that need synchronization. More synchronization means more overhead. This is exactly why Intel introduced TSX. It optimizes synchronization by assuming that most interactions won't be interrupted by a conflict.
  10. B

    Knight's Landing, Skylake to unify instruction sets?

    Why would you want to run Windows on it? Windows is targeted at consumers and servers, not at HPC systems that have barely any need for an OS. Even so, Microsoft could easily create a Phi version of Windows, if there was enough demand. No need for Intel to bend over backwards to support Windows...
  11. B

    Knight's Landing, Skylake to unify instruction sets?

    Fat chance. 14 nm is a node and a half smaller, and they'll probably bring 6 or 8-core to the mainstream market. They'll have enough on their plate to not want to be bothered with a new architecture at the same time. The tick-tock model has worked really well so far. There could be a handful of...
  12. B

    Knight's Landing, Skylake to unify instruction sets?

    Not likely. Xeon Phi is targeted exclusively at the HPC market, and runs software by and for that market. So it doesn't have to be binary compatible with legacy CPU extensions. You may not even want that. Xeon Phi is an in-order execution architecture with hundreds of threads, while desktop...
  13. B

    Hyperthreading Revisited

    Note that Haswell has improved Hyper-Threading performance. It has four arithmetic execution ports, instead of three (which we were stuck with since Core 2). What's more, they're arranged so that it's really two pairs of ports with equal capabilities (for scalar integer instructions). This is...
  14. B

    Yes another Haswell thread. Let's have a look at tock-to-tock IPC.

    But it requires a lot from the hardware! You can't claim IPC scales much more easily by just looking at what it means to developers. That's only half the story, the good part. The bad part is that beyond modest increments it takes a very large amount of hardware to extract more ILP, and worse...
  15. B

    Yes another Haswell thread. Let's have a look at tock-to-tock IPC.

    That certainly couldn't have been your original point: Note that this is what started our discussion. If "inherent scaling problems" refer to "a loss of work, no matter how small", that implies you expect IPC to scale much more easily. In reality IPC only scales by roughly 10% every...
  16. B

    Yes another Haswell thread. Let's have a look at tock-to-tock IPC.

    I like your analogy! Both prefetchers and TSX optimize for the common case. Namely for prefetchers it assumes that your memory accesses are linear so it can predict what data you need next and cache it in advance. In theory this may thrash the cache with unwanted data, but in practice that's...
  17. B

    Yes another Haswell thread. Let's have a look at tock-to-tock IPC.

    Absolutely. There's nothing to "admit" here. I never said it doesn't have a cost. What's relevant is that the cost can and is being lowered, thereby proving that we previously weren't hitting inherent multi-core scaling issues. And to be perfectly clear, by hitting them I mean they would prevent...
  18. B

    Yes another Haswell thread. Let's have a look at tock-to-tock IPC.

    Why would it have to be a language feature? It can be done perfectly through some library functions, or better yet, intrinsics. Don't get me wrong, I certainly agree that it could be very useful to integrate it more tightly with the programming languages, but I really wouldn't say it's held back...
  19. B

    Yes another Haswell thread. Let's have a look at tock-to-tock IPC.

    Wrong. You're right that synchronization adds cost, but TSX lowers that cost. Hence applications where the overhead was larger than the gains were not hitting any "inherent scaling issues", they were simply held back by a lack of efficient synchronization primitives. That's not inherent to...
  20. B

    Worth upgrade to 4770K from FX-8320?

    Since overclocking would destroy the low power consumption, you really want the i7-4770 instead of the i7-4770K. The non-K model is cheaper and comes with TSX technology for better multi-threading efficiency.
  21. B

    Yes another Haswell thread. Let's have a look at tock-to-tock IPC.

    It's a common misconception that GPUs have many cores. The GTX 680 has 'only' 8 cores, with 6 vector units each, which are 32 elements wide. NVIDIA multiplies that all together to claim it has 1536 cores, but really there are only 8. For comparison, mainstream Haswell CPUs have 4 cores, with 2...
  22. B

    Yes another Haswell thread. Let's have a look at tock-to-tock IPC.

    That's only because multi-threading and SIMD parallelism has been too hard for the average developer to take advantage of. Clock frequency increases and IPC improvements are nice since they don't take developer effort, but they've been very modest in comparison to the theoretical performance...
  23. B

    Core i7-4770K is performance crippled

    Guessing? What more proof could you want? You know Mesa is open source, right? You asked whether the conditions where TSX helps occur in a graphics driver. I pointed out a driver where it does, and you can check it for yourself. If you refuse to accept the proof that's right in front of you, or...
  24. B

    Core i7-4770K is performance crippled

    Mesa is full proof. I was merely adding that NVIDIA and AMD's drivers are more complex and therefore are highly likely to have even more multi-threading going on where TSX would help.
  25. B

    Core i7-4770K is performance crippled

    I've already explained why a database transactions benchmark is also relevant to other multi-threaded software. What's your explanation for saying that it "probably doesn't"? Please be specific. Again, why not? You haven't presented any arguments to back that up.
  26. B

    Core i7-4770K is performance crippled

    Yes I have. It's a benchmark, and it's quantified. You have nothing to support that theory. Again, absence of proof is not proof of absence. Instead we know for a fact that TSX speeds up synchronizing between threads, which all multi-threaded software has a need for. So the fear that a 4770K...
  27. B

    Core i7-4770K is performance crippled

    Oh so you agree it's a major issue but now we're down to the semantics of "cripple"? I'm open for suggestions, but keep in mind that Intel deliberately disables a performance feature. They fuse it out. Kill it off in all K models. Then they ask money for it! I think crippling is a pretty...
  28. B

    Core i7-4770K is performance crippled

    It's not about not running "properly". TSX is roughly comparable to Hyper-Threading in usefullness. Applications do run properly on a CPU without HT, but not optimally. Likewise without TSX applications won't run any worse than they did before, but they won't run any faster.
  29. B

    Core i7-4770K is performance crippled

    "Most users" don't care about overclocking either. So let's be clear here, people who are choosing between an i7-4770K or a i7-4770, care a lot about performance and use cutting-edge software that uses/will use the latest CPU features. Deliberately disabling a performance feature from an...
  30. B

    Core i7-4770K is performance crippled

    Databases are just a highly generalized way of storing and retrieving data. That doesn't mean that applications which don't use a database are not storing and retrieving data! Any of the standard container classes is a specialized data storage structure, and accessing that data concurrently from...
  31. B

    Core i7-4770K is performance crippled

    In theory, possibly, in practice, not even close. Anyway, let's not stray off-topic. Feel free to open another thread about it if you want to discuss the details. You've (still) got your definition of fine-grained locking all wrong. It is not a property of the lock implementation. You can have...
  32. B

    Core i7-4770K is performance crippled

    We've had this discussion before. That's a very rough and very vague explanation of what TSX is for. My understanding of TSX is a lot more in-depth than that. It can be used as a fundamental building block for any kind of synchronization between threads. But that explanation from Intel isn't...
  33. B

    Core i7-4770K is performance crippled

    Applications which benefit from AVX2 have parallel workloads so they typically also benefit from multi-threading. But when you have fine-grained tasks to evenly distribute the workload among the cores, then the overhead of the locks can be significant. I've seen cases where each thread spends up...
  34. B

    Core i7-4770K is performance crippled

    TSX is a performance feature, so the 4770K is peformance crippled compared to 4770, by definition. There's no two ways about it. Just how much it is crippled is up for debate, but not whether or not it is. Next, quantifiable data has already been posted before...
  35. B

    Small 4770 (non K) Review

    Glad to hear it works as advertised. 256-bit integer operations should also make a huge difference, especially for mixed float/int code where you previously didn't really have the option of using AVX. Eightfold parallelization of loops will be awesome.
  36. B

    Core i7-4770K is performance crippled

    First of all, and with all due respect, mentioning Java in the context of TSX is ridiculous. Performance critical applications such as games are not written in Java, for good reason. Secondly, no, Java does not feature efficient fine-grained locking at all. The API you mention is only suitable...
  37. B

    Core i7-4770K is performance crippled

    Are you kidding? Everywhere. All the dependencies between draw calls have to be respected, while also allowing the application to update resources, from multiple threads, simultaneously, and servicing some asynchronous queries. Also, graphics drivers have only milliseconds to get things done to...
  38. B

    Core i7-4770K is performance crippled

    Don't blame me of hyperbole; I never said it's the worst processor ever produced in decades. I totally agree the 4770K isn't bad at all in comparison to the 3770K, but that's not the point. The 4770 has more features, which could very well make it faster. Paying extra for disabled features is...
  39. B

    Core i7-4770K is performance crippled

    One more thing about games. Graphics drivers are likely to pick up TSX pretty quickly, which means DirectX 11's multi-threaded rendering will be faster on an i7-4770 than on an i7-4770K, even with existing games.
  40. B

    Core i7-4770K is performance crippled

    Keep in mind that the software is a chicken-and-egg problem. Games don't have too many thread synchronization issues because they avoid it by using few, coarse-grained locks. But while that effectively results in less total overhead from locks, it doesn't offer the best performance! It results...
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |