Would the p4 have better if intel stayed with Rambus.
I read online that orignal Rambus gave the p-4 somewhat of a edge.
I also remember that Intel was to get a large amount of Rambus shares for using it.
I built half a dozen p-3 but never p-4 and went with Amd instead so I didn't know any P-4.
Not really. The pipeline was too deep and did not adequately consider the effect of latencies, and the hard-cap on clock speed imposed by thermal limitations.
On release, the P4 was around 2Ghz and Intel claimed it was designed to scale through 10Ghz in 5-6 years. If that indeed happened, the P4 would have been competitive with lower clocked chips like the Athlon despite latency issues, but obviously, it wasn't.
Rambus doesn't inherently solve any of the problems with the pipeline in the P4. In fact, it only provides marginal improvements in certain types of streaming data.
Realistically, even highly optimized code for a P4 was going to be inherently slower (per clock) on a P4 than on a P3 or Athlon (except in weird cases of specifically anti-AMD code). The ONLY way it was going to catch up is raw clock, which it never was going to be able to do due to the thermal issues. Once the Althon incorporated SSE2, it handily beat the P4 on even the most friendly optimized code as load latency for 128-bit data on the Athlon was only 1-2 cycles, where it was 8-12 cycles on a P4.
The simple fact is that even on 100% cache hits (no DRAM access at all), it was still slower than competitive processors, sometimes as much as 50% slower illustrates that RDRAM wouldn't fix this. Instruction latency of 10-18 cycles on instructions that stall only 3-4 cycles on a P3 or Athlon was a big deal and insurmountable in branchy code, especially in light of smaller caches, poor branch prediction and the elimination of "free" FXCH, among various other instruction-level speed hits. The superior branch prediction on Athlon and Core2 chips also was a substantial difference, and their desire for maximum clock speed held the size of the L1 cache down as well, further hindering performance.
There were some water cooled engineering demos that Intel made at around 5-6Ghz, but they were pushing 180W+ of power, which is starting to get to the point where the substrates and things can't hold up. The architecture just wasn't there... Not to mention the issues they had with electromigration, regardless of cooling, at those speeds.
I do appreciate Intel being innovative. They came out of the very successful P3 architecture, which resembles the modern Athlon/Core architecture. They tried a deep-pipeline chip to see what would happen (P4-Netburst) and they tried a massive explicit superscalar design (IA-64) with the Itanium, both of which flopped, despite being reasonable on paper.
But the world learned a lot about computation and CPU architecture from those real-world experiments and all of our chips are better today due to those lessons learned.