Nvidia Denver... finally here... and it looks good

Loki726 · Aug 14, 2014

Khato said:
NVIDIA is basically trying to follow in the footsteps of Transmeta and Itanium (same concept, just that it forces code to be compiled in that fashion rather than doing such at runtime.) Which isn't necessarily a bad thing since if you can get it right it provides excellent efficiency and performance from what I recall on the theory - please correct if I'm remembering incorrectly as that's from something like ten years ago now.

There was a big difference between Transmeta and Itanium. The Transmeta approach recompiles and reoptimizes the program while it is being executed. The Itanium approach requires the program to be compiled statically. This isn't the same approach at all.

Khato · Aug 14, 2014

Loki726 said:
There was a big difference between Transmeta and Itanium. The Transmeta approach recompiles and reoptimizes the program while it is being executed. The Itanium approach requires the program to be compiled statically. This isn't the same approach at all.

Which I believe, yup, that's exactly what I said. It's the same concept of VLIW where the compiled code that's actually run is tailored to the specific implementation. The only difference is that the Transmeta/NVIDIA approach expects incoming code of a non-VLIW ISA which it re-compiles at run time whereas Itanium gets compiler optimized code from the get go. It's the exact same approach, just that one locks the compilation for optimized code into hardware while the other does so in software. The only reason to take the Transmeta/NVIDIA approach is for ISA compatibility.

Exophase · Aug 14, 2014

Khato said:
Which I believe, yup, that's exactly what I said. It's the same concept of VLIW where the compiled code that's actually run is tailored to the specific implementation. The only difference is that the Transmeta/NVIDIA approach expects incoming code of a non-VLIW ISA which it re-compiles at run time whereas Itanium gets compiler optimized code from the get go. It's the exact same approach, just that one locks the compilation for optimized code into hardware while the other does so in software. The only reason to take the Transmeta/NVIDIA approach is for ISA compatibility.

You're missing something here. nVidia is trying to leverage run-time performance analysis when compiling the code. That also means recompiling the code to something else later if the performance conditions dictate it. This isn't simply about ISA compatibility. Even if it weren't using profile-guided compilation dynamically it'd still be performing optimizing that are specific to the uarch and not just the particular ISA (so it can perform different optimizations for the next uarch). Although in this case the ISA is probably closely coupled to the uarch. But just look again at Itanium, which has had big uarch changes while maintaining the same ISA.

It is, at least in theory, something in between pure static scheduling and pure dynamic scheduling. We'll see how well it works in practice. I think it'll be very good at some workloads and kind of bad at others. I do think nVidia is already being carefully selective with its benchmarks, even with Spec2k. They're also playing games with what kind of hardware they benchmark against, so I'm not taking their numbers that seriously until I see third party benches.

Khato · Aug 14, 2014

Exophase said:
You're missing something here. nVidia is trying to leverage run-time performance analysis when compiling the code. That also means recompiling the code to something else later if the performance conditions dictate it. This isn't simply about ISA compatibility. Even if it weren't using profile-guided compilation dynamically it'd still be performing optimizing that are specific to the uarch and not just the particular ISA (so it can perform different optimizations for the next uarch). Although in this case the ISA is probably closely coupled to the uarch. But just look again at Itanium, which has had big uarch changes while maintaining the same ISA.

And when developing Itanium compilers you leverage run-time analysis to improve the compiler - it's the exact same feedback loop. I'd be quite interested as to how a run-time implementation can actually do better in that regard?

Now it is true that Itanium was locked to a specific ISA whereas the NVIDIA/Transmeta approach can tweak their underlying VLIW whenever they want. But that's merely an implementation detail with no effect on the basic concept. After all, with Itanium you compile to the specific version regardless. Sure it'll still run if compiled for the incorrect iteration, but performance will likely suffer.

Exophase said:
It is, at least in theory, something in between pure static scheduling and pure dynamic scheduling. We'll see how well it works in practice. I think it'll be very good at some workloads and kind of bad at others. I do think nVidia is already being carefully selective with its benchmarks, even with Spec2k. They're also playing games with what kind of hardware they benchmark against, so I'm not taking their numbers that seriously until I see third party benches.

Which is actually why I'd expect that, at best, it could match the Itanium approach as it's basically the same in terms of execution. It just differs in terms of compiler scheduling. The best case for the dynamic approach is that it nails it right off the bat and matches what pre-compiled code would have generated. The worst case is that it gets it wrong, performance suffers, and the feedback loop isn't sophisticated enough to ever get it right. I'd also expect the run-time compilation to be firmware based to some extent, aka who knows how many tweaks NVIDIA put in to get good results for the benchmarks in question.

Exophase · Aug 14, 2014

Khato said:
And when developing Itanium compilers you leverage run-time analysis to improve the compiler - it's the exact same feedback loop. I'd be quite interested as to how a run-time implementation can actually do better in that regard?

Compiler writers using analysis on what works and what doesn't from some benchmarks is not even close to the same feedback loop. I'm not even sure where to start explaining that, it's a really big disconnect.

I figured what you were really getting at is PGO (profile guided optimization), which depends on the developers of the software being compiled (not the compiler writers) profiling their code and then compiling it again. But even that has big limitations vs what Denver is doing, basically:

1) It's limited to whatever data sets and test pattern the developer was using
2) It provides a static set of inputs, typically the average case over a run

The more you try to generalize the input to make #1 better the more you smear the input making #2 worse. There are also some other effects, like the profiling itself affecting performance in a non-uniform way and therefore negatively influencing the results, and that generally only a few coarse-grained parameters are gathered. This differs from something like Denver that can have a lot of fine-grained data on a recent run of instructions because it can stop to let the compiler work immediately.

Like I said, Denver does something in between fully static scheduling and dynamic scheduling. It can adapt to the nuances of how someone uses a program as they're using it, at least at a very high level.

Khato said:
Now it is true that Itanium was locked to a specific ISA whereas the NVIDIA/Transmeta approach can tweak their underlying VLIW whenever they want. But that's merely an implementation detail with no effect on the basic concept. After all, with Itanium you compile to the specific version regardless. Sure it'll still run if compiled for the incorrect iteration, but performance will likely suffer.

In the real world, software isn't universally recompiled every time a new CPU is released. And certainly not for something like NDK on Android, where the software is targeting a ton of different hardware.

Khato said:
Which is actually why I'd expect that, at best, it could match the Itanium approach as it's basically the same in terms of execution. It just differs in terms of compiler scheduling. The best case for the dynamic approach is that it nails it right off the bat and matches what pre-compiled code would have generated. The worst case is that it gets it wrong, performance suffers, and the feedback loop isn't sophisticated enough to ever get it right. I'd also expect the run-time compilation to be firmware based to some extent, aka who knows how many tweaks NVIDIA put in to get good results for the benchmarks in question.

No, just no. Look, just because the instructions have been described as VLIW and it's in-order doesn't make it the same thing in terms of execution. For one thing, patents hint that Denver will use run-ahead execution on memory stalls which can act as a very good prefetcher (which in part explains why the memory bench numbers are so good). For another thing,

Your understanding of compilation is off here, this isn't dynamic compilation because most of the hard parts of compilation (and most optimizations) have been done on the ARM binary. This is dynamic translation + recompilation which refines and improves the baseline. The best case is better than what static scheduling can be done, not merely "matches pre-compiled code." By your reasoning one wonders why anyone even bothers with OoOE, if the compiler can just get everything right statically. This is a compromise between those two extremes.

ams23 · Aug 14, 2014

Khato said:
I'd also expect the run-time compilation to be firmware based to some extent, aka who knows how many tweaks NVIDIA put in to get good results for the benchmarks in question.

From the Tirias paper:

Once the ARM code sequence is optimized, the new microinstruction sequence is stored in an optimization cache in main memory and can be recalled when the ARM code branch target address is recognized in a special optimization lookup table on-chip. A branch target hit in the table provides a pointer in the optimization cache for the optimized sequence, and that sequence is substituted for the original ARM code (Figure 2). This optimization lookup table is a 1K 4-way memory which holds ARM branch to optimization target pointer pairs, and it is looked up in parallel to the instruction cache. The optimization cache is stored in a private 128MB main memory buffer, a minor set aside in systems with 4GB (or more) of main memory. The optimization cache also does not contain any pre-canned code substitutions design to accelerate benchmarks or other applications.

tempestglen · Aug 14, 2014

ams23 said:
Ok, here is roughly what the graph shows regarding performance relative to R3 Cortex A15 in Tegra K1:

DMIPS
Baytrail (Celeron N2910): 0.45x
S800 (Krait 400 8974AA): 0.95x
Tegra K1 (R3 Cortex A15): 1.00x
A7 (Cyclone): 1.30x
Haswell (Celeron 2955U): 1.00x
Tegra K1 (Denver): 1.80x

SPECInt 2K
Baytrail (Celeron N2910): 0.70x
S800 (Krait 400 8974AA): 0.60x
Tegra K1 (R3 Cortex A15): 1.00x
A7 (Cyclone): 0.90x
Haswell (Celeron 2955U): 1.30x
Tegra K1 (Denver): 1.45x

SPECFP 2K
Baytrail (Celeron N2910): 0.85x
S800 (Krait 400 8974AA): 0.80x
Tegra K1 (R3 Cortex A15): 1.00x
A7 (Cyclone): N/A
Haswell (Celeron 2955U): 1.95x
Tegra K1 (Denver): 1.75x

AnTuTu 4
Baytrail (Celeron N2910): N/A
S800 (Krait 400 8974AA): 0.80x
Tegra K1 (R3 Cortex A15): 1.00x
A7 (Cyclone): 0.70x
Haswell (Celeron 2955U): N/A
Tegra K1 (Denver): 1.00x

Geekbench 3 Single-Core
Baytrail (Celeron N2910): 0.65x
S800 (Krait 400 8974AA): 0.80x
Tegra K1 (R3 Cortex A15): 1.00x
A7 (Cyclone): 1.20x
Haswell (Celeron 2955U): 1.20x
Tegra K1 (Denver): 1.65x

Google Octane v2.0
Baytrail (Celeron N2910): 0.70x
S800 (Krait 400 8974AA): 0.65x
Tegra K1 (R3 Cortex A15): 1.00x
A7 (Cyclone): 0.70x
Haswell (Celeron 2955U): 1.45x
Tegra K1 (Denver): 1.30x

16MB Memcpy (GB/s)
Baytrail (Celeron N2910): 0.85x
S800 (Krait 400 8974AA): 0.80x
Tegra K1 (R3 Cortex A15): 1.00x
A7 (Cyclone): 1.15x
Haswell (Celeron 2955U): 1.55x
Tegra K1 (Denver): 1.40x

16MB Memset (GB/s)
Baytrail (Celeron N2910): 0.40x
S800 (Krait 400 8974AA): 0.75x
Tegra K1 (R3 Cortex A15): 1.00x
A7 (Cyclone): 0.80x
Haswell (Celeron 2955U): 0.65x
Tegra K1 (Denver): 1.05x

16MB Memread (GB/s)
Baytrail (Celeron N2910): 1.25x
S800 (Krait 400 8974AA): 1.55x
Tegra K1 (R3 Cortex A15): 1.00x
A7 (Cyclone): 1.85x
Haswell (Celeron 2955U): 2.55x
Tegra K1 (Denver): 2.60x

So on average, TK1-Denver is ~ 1.45x faster than A7-Cyclone.

The A8 CPU is expected to be clocked ~ 40% higher than the A7-Cyclone CPU, so the overall performance should be similar to the TK1-Denver CPU.

A8 will double A7 performance with 40% higher frequency , and 40% higher IPC.

It called A8 not A7 Pro.

TuxDave · Aug 14, 2014

ams23 said:
From the Tirias paper:

The optimization cache also does not contain any pre-canned code substitutions design to accelerate benchmarks or other applications.

Click to expand...

Don't worry, they'll take care of that during testing to make sure that the best microinstruction sequence is generated for the optimization cache. I mean, if you're going to do a lot of performance studies, better make sure the important benchmarks are in there.

TrulyUncouth · Aug 14, 2014

ams23 said:
From the Tirias paper:

I am not sure trusting a company's word on this is smart. Samsung, to this day, denies its proven cheating on benchmarks. I think the only way we can be sure is game-based benchmarks or a benchmark company that cares enough to try dodging pre-optimization in a one-off recompile or something.

I am not saying Nvidia is gaming benchmarks but companies with seemingly a lot more to lose have done so in the past. I'll be interested to see performance comparisons in real games. Damn, I wish this processor were x86 compatible like originally planned.

Khato · Aug 14, 2014

Exophase said:
Compiler writers using analysis on what works and what doesn't from some benchmarks is not even close to the same feedback loop. I'm not even sure where to start explaining that, it's a really big disconnect.

I figured what you were really getting at is PGO (profile guided optimization), which depends on the developers of the software being compiled (not the compiler writers) profiling their code and then compiling it again. But even that has big limitations vs what Denver is doing, basically:

1) It's limited to whatever data sets and test pattern the developer was using
2) It provides a static set of inputs, typically the average case over a run

The more you try to generalize the input to make #1 better the more you smear the input making #2 worse. There are also some other effects, like the profiling itself affecting performance in a non-uniform way and therefore negatively influencing the results, and that generally only a few coarse-grained parameters are gathered. This differs from something like Denver that can have a lot of fine-grained data on a recent run of instructions because it can stop to let the compiler work immediately.

Like I said, Denver does something in between fully static scheduling and dynamic scheduling. It can adapt to the nuances of how someone uses a program as they're using it, at least at a very high level.

Let's try this another way, what does NVIDIA's PR material say that their optimizer actually does?

Unrolls Loops
Renames registers
Reorders Loads and Stores
Improves control flow
Removes unused computation
Hoists redundant computation
Sinks uncommonly executed computation
Improves scheduling

Yup, sounds like pretty much the exact same things that a compiler for Itanium's IA-64 does. Oh, and the most important thing given what you're portraying it as:

DYNAMIC CODE OPTIMIZATION
OPTIMIZE ONCE, USE MANY TIMES

Not "adapt to the nuances of how someone uses a program as they're using it", just optimize once and then use it. Which is typically how I'd define compilation.

As for what I'm getting at, sadly I don't recall any of the proper terms any more given that it's been quite some time, but no, I'm not getting at profile guided optimization. I'm getting at how one compiles for a VLIW architecture, which I do still remember reasonably well from a bit of time spent in college looking for interesting research projects to participate in and hence spending about a week with a team working on performance optimization of compilation for a VLIW architecture, specifically the first Itanium iteration. They were basically taking various pieces of code, tweaking their compiler, then running it with the applicable performance counters enabled. (And as implied, no, I didn't stick around since I found another project far more interesting, else I'd probably still be able to explain it to far greater detail.)

Exophase said:
In the real world, software isn't universally recompiled every time a new CPU is released. And certainly not for something like NDK on Android, where the software is targeting a ton of different hardware.

Actually, in the context of the example I was giving it is. And last I checked Itanium is still in use in the real world I'd be extremely surprised if even 5% of Itaniums in use aren't running code compiled for their specific iteration. But yeah, that's just one of many reasons why Itanium isn't practical for running Android.

Exophase said:
No, just no. Look, just because the instructions have been described as VLIW and it's in-order doesn't make it the same thing in terms of execution. For one thing, patents hint that Denver will use run-ahead execution on memory stalls which can act as a very good prefetcher (which in part explains why the memory bench numbers are so good). For another thing,

Your understanding of compilation is off here, this isn't dynamic compilation because most of the hard parts of compilation (and most optimizations) have been done on the ARM binary. This is dynamic translation + recompilation which refines and improves the baseline. The best case is better than what static scheduling can be done, not merely "matches pre-compiled code." By your reasoning one wonders why anyone even bothers with OoOE, if the compiler can just get everything right statically. This is a compromise between those two extremes.

Okay, so patents hint that it might use run-ahead execution on memory stalls which makes it nothing like Itanium? Sure, we can just ignore all the ways in which they look the same. (Of course, I would be interested to read those hinting patents seeing as how memory stalls are a bane of in order architectures.)

As for your analysis of the 'hard parts' being done on the ARM binary I'd tend to disagree, at least in terms of the context of keeping all of Denver's execution resources busy. Which I'd expect is being done because otherwise there wouldn't be much cause for having that many execution resources. Of course most modern cores do similarly with respect to instruction re-ordering, but that's at a finer grained level than what NVIDIA appears to be describing here.

And I don't see how you can make the leap from ideal, VLIW pre-compiled code being at worst equal to a run-time generated version to OoO execution having no value. Sure ideal, pre-compiled code would take care of the instruction re-ordering benefit of OoO execution that makes it easier to run multiple instructions in parallel, but it does nothing to protect against cache misses. (Which is the reason why Itanium 2 onwards made use of some manner of multi-threading.)

Idontcare · Aug 15, 2014

TuxDave said:
Don't worry, they'll take care of that during testing to make sure that the best microinstruction sequence is generated for the optimization cache. I mean, if you're going to do a lot of performance studies, better make sure the important benchmarks are in there.

jdubs03 · Aug 15, 2014

tempestglen said:
A8 will double A7 performance with 40% higher frequency , and 40% higher IPC.

It called A8 not A7 Pro.

Let's see a source for that.

The pressure for Apple to dramatically overhaul their architecture isn't there as they can use 20nm to boost clock speeds while keeping power consumption the same.

Don't get me wrong they could wind up doing that like the A6 to A7 transition, but I wouldn't expect them to go wild with any uArch changes, especially on the order of 40%. They already transitioned to ARMv8 giving them a nice solid base for their custom Cyclone uArch, we will probably see a tweaking of that rather than a new uArch. I think it is more likely to see a new uArch next year, or the year after with ARMv8-A.

Eventually. the IPC improvements will slow down, Intel has already ran into this problem and are only increasing at a 5% pace with Core.

The performance benefit is still there for Apple and Nvidia, because they get to realize the gains from significantly improved processes (1.5 node and 2.5 nodes behind Intel respectively). That deficit is why I'm impressed with Denver as it can compete with the Core M's yet be at 28nm planar vs. 14nm 2nd Gen Tri-gate. When normalized for process node, it is a beast (I think Intel was afraid to give an x86 license to be honest, they didn't want to get out uArch designed).

Exophase · Aug 15, 2014

Khato said:
Let's try this another way, what does NVIDIA's PR material say that their optimizer actually does?

Yup, sounds like pretty much the exact same things that a compiler for Itanium's IA-64 does. Oh, and the most important thing given what you're portraying it as:

ALL of these things are influenced by dynamic context! You can't just say well both X and Y compilers do this so they're going to have the same result. Profiling inputs matter.

Besides that, stuff like "Sinks uncommonly executed computation" doesn't even really make sense in static context. Another thing that you're totally missing is that directly executing or lightly optimizing less hot code and applying much more space expensive optimizations on hot code balances out icache use a lot better than just optimizing everything to the same extent.

There's a couple really big things here that nVidia hasn't said a lot about, but can make a big difference in such trace-based dynamic VLIW approaches:

1) Dealing with memory disambiguation/alias prediction in some low-overhead fashion, where heavily faulting blocks are translated back into less aggressively reordered ones
2) Tying the trace execution mechanism in with branch prediction so traces that cross branches are fetched ahead of time

There's some good potential for synergy between hardware features and code that's translated in hot traces, things that are not just "let's do the compilation later"

Khato said:
Not "adapt to the nuances of how someone uses a program as they're using it", just optimize once and then use it. Which is typically how I'd define compilation.

If you'd read the documents in-depth beyond clipping out a buzzphrase you'd see that it repeatedly runs the compiler long after new ARM instructions are introduced and repeatedly performs more vital optimizations on the hottest code. And if you go back and look at Transmeta docs (which Denver is very, very clearly evolving, some key people also worked on both) you'll see it has iterative recompilation as well.

When nVidia says "optimize once, use many times" the point they're trying to get across is that for each one optimization pass the code will be executed many times. Not that it'll only translate any given code once.

And even if it DID only translate code once it'd STILL be better than what you're saying because it'd be warmed with profiling data specific to the user's run.

Khato said:
Okay, so patents hint that it might use run-ahead execution on memory stalls which makes it nothing like Itanium? Sure, we can just ignore all the ways in which they look the same. (Of course, I would be interested to read those hinting patents seeing as how memory stalls are a bane of in order architectures.)

Have you ever used TI's C6x? Does it look the same as Itanium to you too because it's VLIW and in-order? It's nothing even close to the same.

Khato said:
As for your analysis of the 'hard parts' being done on the ARM binary I'd tend to disagree, at least in terms of the context of keeping all of Denver's execution resources busy. Which I'd expect is being done because otherwise there wouldn't be much cause for having that many execution resources. Of course most modern cores do similarly with respect to instruction re-ordering, but that's at a finer grained level than what NVIDIA appears to be describing here.

Look at what any compiler does, most of the work isn't spent in scheduling. It's simply misleading to say that Itanium will have the compilation done upfront while Denver will have the compilation done at run-time.

Khato said:
And I don't see how you can make the leap from ideal, VLIW pre-compiled code being at worst equal to a run-time generated version to OoO execution having no value. Sure ideal, pre-compiled code would take care of the instruction re-ordering benefit of OoO execution that makes it easier to run multiple instructions in parallel, but it does nothing to protect against cache misses. (Which is the reason why Itanium 2 onwards made use of some manner of multi-threading.)

It can provide some latency hiding from cache misses, given a non-blocking cache you can more aggressively push back loads from where they're needed. But there's a disadvantage just doing this uniformly everywhere because it increases register pressure and code size, especially when you start pushing them before branches, or on both sides of a branch. But it can still be worthwhile for blocks that statistically miss a lot.

Retraining the blocks vs changing trends in branch prediction at a coarse grain is also better than just going with PGO or worse, nothing.

I think maybe you should go read some more articles about dynamic optimization, like for instance Dynamo - then start thinking about how much it can be extended when designing a uarch around these principles.

kpkp · Aug 28, 2014

First unofficial benchmark?
https://tktechnewsblog.files.wordpress.com/2014/08/wpid-wp-1409245792111.jpeg

jdubs03 · Aug 28, 2014

Good good. I thought the Antutu score was going to be the same or a bit lower, as the one "leak" way back (saying 3.0 GHz) was slightly lower. But this is a pretty solid result.

DrMrLordX · Aug 29, 2014

Back when Transmeta was selling their own chips, they used dynamic recompilation to effectively run x86 code without producing an actual x86 processor. Nvidia seems to be doing the same thing to enable Denver to run OoO ARM code.

Does this mean that Denver could be used to run other instruction sets?

VirtualLarry · Aug 29, 2014

DrMrLordX said:
Back when Transmeta was selling their own chips, they used dynamic recompilation to effectively run x86 code without producing an actual x86 processor. Nvidia seems to be doing the same thing to enable Denver to run OoO ARM code.

Does this mean that Denver could be used to run other instruction sets?

I believe so. If they could only get an x86 license...

TurtleCrusher · Aug 29, 2014

TuxDave said:
Well.... AMD did a pretty good job back in the P4 days.

The Pentium 4 was so far ahead of its time. Intel had enough breathing room to take some gambles and innovate. AMD innovated out of necessity.

Compare a single core AMD 3500+ to a P4 Northwood with HT. Back in the day I was psyched to get that super fast benchmarking 3500+ Venice core. If you've had either side by side in the past few years, the P4 with HT wins every single time in system responsiveness and user experience. We weren't benchmarking for the future back then. How does that extra few fps feel when it's jagged because of an instant message, when with HT on the P4 makes gaming a bit more smooth and consistent at the expense of pure performance.

People knock the P4, but it was a forward thinking processor that came too soon to be appreciated.

ShintaiDK · Aug 29, 2014

VirtualLarry said:
I believe so. If they could only get an x86 license...

Emulation doesnt require a license.

itsmydamnation · Aug 29, 2014

ShintaiDK said:
Emulation doesnt require a license.

yeah the license is needed for the "cold" code and was one of their biggest x86 issues.

NTMBK · Aug 29, 2014

Scholzpdx said:
The Pentium 4 was so far ahead of its time. Intel had enough breathing room to take some gambles and innovate. AMD innovated out of necessity.

Compare a single core AMD 3500+ to a P4 Northwood with HT. Back in the day I was psyched to get that super fast benchmarking 3500+ Venice core. If you've had either side by side in the past few years, the P4 with HT wins every single time in system responsiveness and user experience. We weren't benchmarking for the future back then. How does that extra few fps feel when it's jagged because of an instant message, when with HT on the P4 makes gaming a bit more smooth and consistent at the expense of pure performance.

People knock the P4, but it was a forward thinking processor that came too soon to be appreciated.

Modern software is also much more likely to use SSE/SSE2, where the Athlon 64's half width SIMD units suffer. Frankly though either CPU is total junk for a modern operating system- you'd be better off with a Kabini than a Pentium 4.

NTMBK · Aug 29, 2014

ShintaiDK said:
Emulation doesnt require a license.

Sadly the NVidia/Intel settlement explicitly bans NVidia from producing an x86 emulator of any sort.

Idontcare · Aug 29, 2014

NTMBK said:
Sadly the NVidia/Intel settlement explicitly bans NVidia from producing an x86 emulator of any sort.

Hooray for the US court system! Its always the golden rule - he who has the gold makes the rules

I wish I had me some gold, have some rules I'd like to make...

FatherMurphy · Aug 29, 2014

If that term was in the settlement, it would have been entered into after the parties negotiated and reached a meeting of the minds on its terms (otherwise it would not be enforceable). In other words, if there is a provision prohibiting Nvidia from producing an x86 emulator in the settlement, it's because Nvidia agreed to that condition.

I'm assuming when Nvidia entered that agreement, it decided as a matter of business that its best course ahead was in the ARM world and not in x86.

NTMBK · Aug 29, 2014

Idontcare said:
Hooray for the US court system! Its always the golden rule - he who has the gold makes the rules

I wish I had me some gold, have some rules I'd like to make...

Meh, it was an out of court settlement, NVidia agreed to those terms- they weren't imposed by a court.

Nvidia Denver... finally here... and it looks good

Senior member

Golden Member

Diamond Member

Golden Member

Diamond Member

Senior member

Member

Lifer

Senior member

Golden Member

Elite Member

Senior member

Diamond Member

Senior member

Senior member

Lifer

No Lifer

Lifer

Lifer

Platinum Member

Lifer

Lifer

Elite Member

Senior member

Lifer