64 bit CPU. A true technical review please

Sahakiel · Nov 19, 2004

Long read...almost like a CPU/Processors thread.

So, where's the technical discussion?

Gannon · Nov 20, 2004

Originally posted by: rgwalt

I could be wrong, but I think 64-bit processors are where the future is heading. I decided not to go the Intel path and to go with the 939 board because of future compatability and upgradability.

R

By the time 64-bit is truly ready for primetime it will be at least ~5 years from now. 64-bit is just a gimmick it will speed up certain operations but it wont be like night and day, it will allow for more accurate calculations and bigger range of addressable memory but as complexity scales we will see diminishing returns until the "rest of the system" (motherboard, ram, buses, etc) catch up. In my opinion 64-bit in it's current state is utterly useless at present to what most people and gamers use their computers for, you'll still get more bang for your buck out of video cards that double in performance every year.

bobsmith1492 · Nov 21, 2004

R[/quote]

64-bit is just a gimmick
[/quote]

uOpt · Nov 24, 2004

From my perspective (software developer for a non-GUI type system):

the most important aspect is 64 bit pointers for more than 4 GB of virtual memory. Depending on which OS you use you are further limited on 32 bits, some OSes leave you with 2 GB of VM for your userland application, most with 3 GB and some (like Fedora Core 2 and 3 which have the 4GB/4GB split patch) actually allow you to use just short of 4 GB.

Note virtual memory. Even if you only have 256 MB of pysical RAM, it can make perfect sense to map (think of kind of load) 6 or 20 gigabytes of data even if you only plan to ever touch 200 MB out of it during the lifetime of your application. And it will not cause any kind of slowdown compared to selectively map only those 200 you need. You have to understand how virtual memory works, please ask if you need to know more.

The old limit also leads to artefacts and complications, for example the Linux kernel goes through quite some hops to keep being efficient even when you have 12 GB of RAM in a 32 bit machine. The 4GB/4GB patch I mentioned is also complicated. These complications don't lead to much of a slowdown because modern CPUs are so fast, but the complication make the software more error-prone and harder to maintain.

In comparisions of most 32 bit to 64 bit environments the 64 bit one will also have native 64 bit integer arthmetic. If you need integers with more than 32 bits this can be a huge advantage. The ia32 instruction set does have some support for 64 bit integer arthmetic but it is not as simple and fast as it is on existing true 64 bit architectures.

In the case of AMD64 you also get more registers, but the performance impact seems to be minimal. Modern CPUs implementing the stupid i386 instruction set with its few registers have hidden register sets that support much faster swapping of register contents that you would think. But again, the important point about more registers is that it makes life easier, in this case for the compiler writer, and hence it can make the compiler itself progess faster.

I ran a bunch of tests with programs compiled in 32 bit mode and in 64 bit mode on AMD64. The 32 bit code generally performs a little better, I attribute that to the increased space of pointers and hence a less effective L1 data cache. But it is in the few percent ballpark.

There is software which is optimized for 32 bit and performs horribly when compiled as 64 bit code. MPlayer/Mencoder are an example, there's a 40% performance drop, and it is not due to not using MMX, SSE or 3dnow, I checked, it is compiling them in.

In practical use, I found that there is surprisingly much stuff that runs perfect on AMD64 with 64 bit Linux and FreeBSD. The only things I kept as 32 bit code are MPlayer/MEncoder/Xine (besides the slowdown the 64 bit version will also be unable to handle Windows DLL plugins) and Mozilla Firefox. Firefox I have as 32 bit version so that I can use the flash plugin which is not available a 64 code binary and so that I can just copy over my maintained plugin collection from my 32 bit boxes.

Contrary to what some people wrote, it is not possible to temporarily execute 32 bit code inside the lower 4 GB of a 64 bit process on AMD64. There are some instructions which behave differently, so without a process boundary wrapper you will not be able to load 32 bit binary shared libraries into a 64 bit process.

uOpt · Nov 24, 2004

Originally posted by: Varun

Intel is implementing x86-64, but with a couple differences I think. It SHOULD be compatible completely, but I guess that remains to be seen. Intel is releasing the 64 bit extensions on the Xeon first, as they are still saying that the desktop doesn't need 64 bits.

Hopefully the Intel implementation will perform to the standard that AMD has set down for x86-64. I really can not see why they would not have all of the same commands, other than to have the Beta vs VHS war.

The Intel implementation requires some minimal changes for OS-level code. The Linux kernel included the required changes within a few days, it was no big deal.

From the view of a userlevel program it is 100% compatible.

Artanis · Nov 24, 2004

Originally posted by: MartinCracauer
Contrary to what some people wrote, it is not possible to temporarily execute 32 bit code inside the lower 4 GB of a 64 bit process on AMD64. There are some instructions which behave differently, so without a process boundary wrapper you will not be able to load 32 bit binary shared libraries into a 64 bit process.

Of course there is a 64bit trend,
But you can use up to 3GB virtual RAM, with /3 switch, or even more, with PAE model on 32bit systems...
I suppose is more than enough for most of the consumers for a while...

Gioron · Nov 24, 2004

Originally posted by: flamingspinach
My question is: we've heard a lot about AMD's Athlon64 (which is the same as K8?), but what is Intel's plan for a 64-bit architecture? Will the two be compatible? Will there be some sort of a split of the IBM-compatible (commonly referred to as "PC") standard? And if so, would AMD's 64-bit or Intel's 64-bit be the one to go with?

Here's my long, rambling answer to your question, to the best of my knowledge.

Although I don't follow server processors too much, my understanding was that some server processors from Intel were already 64 bit processors, but they were a completely different instruction set so they were incompatible with x86 processors. Getting rid of the legacy x86 cruft actually made for a better processor, but the incompatibility meant that it wasn't all that popular since all software running on it had to be ported over to the new instruction set. In theory, this is a good idea, since the x86 instruction set is really getting rather bloated, but in practice, the processors just weren't popular. Intel's plan was to eventually move this newer 64 bit instruction set to the desktop line, when they felt they had enough market share in the server sector to get enough programs recompiled for the new instruction set. So, they didn't really want a 64 bit extention on the x86 architecture.

Unfortunately for Intel, AMD didn't really agree with this philosophy, so they came out with their Athlon 64 chips that added an extension on x86 instead of a completely new instruction set. Because of the backwards compatibility and performance improvements, these chips have become increasingly popular. Once there was a decent x86 64 bit extension for desktop chips, Intel's plan to translate the new instruction set to desktops was blown out of the water. People would much rather have something that "just works" instead of breaking all their applications and spending time fixing it. Even if fixing everything will end up with a better result, its just not worth the hassle.

So, Intel now needed a 64 bit x86 extension, and fairly quickly. Where were they going to get it? Well, they have a cross liscencing deal with AMD, so they called AMD and said something like "we want to use your 64 bit extension" and AMD said "here is the info you need". There were a few minor modifications, but mainly its a direct copy of AMDs extension because they're using the documentation and liscencing provided by AMD. They may call it something different, but its almost entirely the same thing so both will be fairly cross compatible (although I've heard that Intel may not have implemented an IOMMU, which might cause some slowdowns on Intel compared to AMD, but this has yet to be confirmed).

So, in answer to your question, no there won't be a split between Intel and AMD processor compatibility.

flamingspinach · Nov 25, 2004

Thanks That's twice you've helped me out now, incidentally. Small world...

-fs

Vee · Nov 26, 2004

Originally posted by: Artanis
Of course there is a 64bit trend,
But you can use up to 3GB virtual RAM, with /3 switch, or even more, with PAE model on 32bit systems...
I suppose is more than enough for most of the consumers for a while...

Please don't bullshit people
- Though I realize, that you're not clear about the fact that segmented 32bit is another software format. But it is. And it's far, far cheaper and easier to write software for 64-bit instead. PAE is an illusion in this context. (Feelings also run a bit hot on on this segmented memory subject. I've heard one programmer threaten with physical violence and baseball bats, if anyone mentioned segmented model, one more time.)

Basically, - kiss 32-bit goodbye. Software will *soon* have to move to a new format. Either 64-bit or 32-bit segmented. - And, regardless of Oracle, it is NOT going to go segmented.

I would also assume a lot of those app packages, which are liable to run into the 2G barrier, utilize shared libraries and shared data, so the 3GB switch could be troublesome.

Finally, for that "while", How do you feel all these figures and dates fits?

256K - 1986
512K
1M - 1989
2M
4M - 1992
8M
16M - 1995
32M
64M - 1998
128M
256M - 2001
512M
1G - 2004 In time (64bit)
2G
4G - 2007 Too late
8G
16G - 2010

Moore's famous law, of course. (18 months = 2 * )
Gamers should also consider this: Any 32-bit software, including any game, cannot use much more than 1½GB memory, before running into the 2GB barrier. - How much memory have your games gobbled up lately? Performance, is NOT the main reason game development is moving to 64-bit.

PolPot · Nov 27, 2004

It's been about 10 years since I wrote a serious program, but back in the days of DOS, having access to 32-bit instructions in assembler (e.g. movsd, lodsd, stosd) made a huge difference in how programs ran on a 386/486. For example:

STOSB xx, xx would move one byte at a time
STOSW xx, xx would move one word at a time (two bytes)
STOSD xx, xx would move one dword at a time (four bytes)

Anytime you wanted to move any serious amount of data, you would use STOSD and it made a serious difference in how fast an operation completed. The difference between using STOSB and STOSW on the same processor was tremendous.

So I'd imagine that AMD-64 has something like STOSQ (quad word) for this type of operation. And I would imagine that this would have a simple, real-world effect on the performance of common programs.

Any thoughts?

Artanis · Nov 27, 2004

Please don't bullshit people

Ok, Vee, don't get angry, i've only exposed one opinion. You should better make a poll entitled 'how much memory do you have in your PC, and we'll see how many need 64 bit liniar memory adressing.
You said the performance was not the main reason, that's clear, because 64 bits has nothing to do with performance increase, on its own. If I'm wrong, please give me some opposite examples.
Thanks!

Vee · Nov 27, 2004

Originally posted by: PolPot
It's been about 10 years since I wrote a serious program, but back in the days of DOS, having access to 32-bit instructions in assembler (e.g. movsd, lodsd, stosd) made a huge difference in how programs ran on a 386/486. For example:

STOSB xx, xx would move one byte at a time
STOSW xx, xx would move one word at a time (two bytes)
STOSD xx, xx would move one dword at a time (four bytes)

Anytime you wanted to move any serious amount of data, you would use STOSD and it made a serious difference in how fast an operation completed. The difference between using STOSB and STOSW on the same processor was tremendous.

So I'd imagine that AMD-64 has something like STOSQ (quad word) for this type of operation. And I would imagine that this would have a simple, real-world effect on the performance of common programs.

Any thoughts?

I think you can use MMX for that kind of stuff. The multi data "width" -thing is sort of taken care of by all the various vector extensions, MMX, 3DNow, SSE, 3DNow+, SSE2.
64-bit does give us 64bit integer operations in 64bit registers. Primary use for that is for address arithmetic with 64bit pointers, but it also comes handy for some other stuff, like encryption and maybe color operations.

uOpt · Nov 27, 2004

Originally posted by: Artanis
Of course there is a 64bit trend,
But you can use up to 3GB virtual RAM, with /3 switch, or even more, with PAE model on 32bit systems...
I suppose is more than enough for most of the consumers for a while...

"Virtual RAM"? Sorry that's a misnormer.

The point here is that a flat address space makes life for the programmer easier. It is a huge advantage to just "what-the-hell" a data layout problem, just map it all and let the MMU sort out the actual access. Plus it's faster than either your own hackery to do partial read/write or mappings.

And nobody wants to go and repeat the horrors of segemented memory by trying to use AWE in one userlevel process.

What I really like is MS naming their technology to get 3 instead of 2 GB of VM "4GT". Talk about fooling people. Under Linux with the 4G/4G patch you get just short of 4 GB, and without you are at 3 GB by default.

Artanis · Nov 27, 2004

I agree

Vee · Nov 27, 2004

Originally posted by: Artanis

Please don't bullshit people

Click to expand...

Ok, Vee, don't get angry, i've only exposed one opinion. You should better make a poll entitled 'how much memory do you have in your PC, and we'll see how many need 64 bit liniar memory adressing.
You said the performance was not the main reason, that's clear, because 64 bits has nothing to do with performance increase, on its own. If I'm wrong, please give me some opposite examples.
Thanks!

Sorry.

I'm not angry at all. I made that comment sort of flippantly, and added the smilie to take out the impact.

Well, people can compare their memory position to the list, and adjust dates.

As for the rest, I'm not sure I understand you right, but I'll respond:
64-bit would have been a big performance boost, if we hadn't already increased processing width beyond 32-bit, for so many things, with MMX, SSE, SSE2. This is not really 32bit vs. 64bit, rather it's that we have already gone 64bit and 128bit for vector fields.
64-bit will still be a performance boost for anything that operates on large integer fields, which is not terrible common yet though.
64-bit relative to the alternative 32-bit segmented, has a considerable performance advantage.

Ok, that's the issues regarding 64 bits as such. But then comes the issues from the new ISA. The 64-bit ISA, '86-64, has twice as many visable registers in long mode, and this should give opportunity for some nice compiler optimizations.

Beyond that, there is the potential to achieve more efficient memory mapping, which could also lead to a slight performance increase.

But primarily, the 64-bit thing is about being able to use 64 bit pointers.

Artanis · Nov 27, 2004

good point, end of story. primarily, 64 bit is not the 'performance fever' as some would believe is twice as speedy (because 64=32*2 ). that's the point...
cya!

PolPot · Nov 28, 2004

Ok, I see. I looked up a primer on MMX it features 64-bit load/store instructions. Neat. Shows how behind the times I am.

Originally posted by: Vee
I think you can use MMX for that kind of stuff. The multi data "width" -thing is sort of taken care of by all the various vector extensions, MMX, 3DNow, SSE, 3DNow+, SSE2.
64-bit does give us 64bit integer operations in 64bit registers. Primary use for that is for address arithmetic with 64bit pointers, but it also comes handy for some other stuff, like encryption and maybe color operations.

grant2 · Nov 29, 2004

So is it true that 64-bit integer math will be performed much faster on a 64bit cpu? (because it doesn't have to break it down into multiple 32-bit operations)?

Is it also true that 64-bit floating point math will not be performed any faster, because modern CPUs already have specialized 128-bit hardware to handle that?

inquiring minds...

TomKazansky · Nov 29, 2004

a64 should be faster in 64 bit mode, the 64bit mode is nothign but more than a "platform" change, things have to move on, and we're hitting the 4gb limit, it is the same as asking why we have to go from t-birds to plamino and form plamino to t-breds

the 64bit did NOT make the a64 fast, it's the register and the design of the address line (something like that) that makes it faster, along with its built in memory controller. A64 is more efficient compare to intel now, while intel is just increasing the ghz and the address line (10 vs 31<---baskin robins)

bottom line, efficiency vs raw power, and efficiency wins

in the future we will probably see an a64 10,000 that runs at 3 ghz.....lolz

Artanis · Nov 30, 2004

Originally posted by: infestedgh0st
a64 should be faster in 64 bit mode

I would like to see some 64bit benchmarks, previews...do you guys have some links or results ?
Smth. to convince me that 64bit is any faster than 32bit computing ...

uOpt · Nov 30, 2004

Originally posted by: Artanis

Originally posted by: infestedgh0st
a64 should be faster in 64 bit mode

Click to expand...

I would like to see some 64bit benchmarks, previews...do you guys have some links or results ?
Smth. to convince me that 64bit is any faster than 32bit computing ...

Hey, did you listen at all? For the most part 64 is not faster as such.

Here are some benchmarks:
http://www.anandtech.com/linux/showdoc.aspx?i=2114

uOpt · Nov 30, 2004

Originally posted by: grant2
So is it true that 64-bit integer math will be performed much faster on a 64bit cpu? (because it doesn't have to break it down into multiple 32-bit operations)?

Exactly. However, don't underestimate how fast modern CPUs can execute some plain code. You will enever see more than a factor of two in performance when doing 64 bit integer math on 32 bit versus 64 bit CPUs, although the assembly code looks much bigger. The Pentium-4 in particular is very fast with predictable code (code doing this is very predictable).

Is it also true that 64-bit floating point math will not be performed any faster, because modern CPUs already have specialized 128-bit hardware to handle that?

Not as such. The 128 bit units are totally seperate SMID units (SSE) and are not used by normal floating point code. They are the same for ia32 and amd64_64 except the Prescott and variants have SSE3.

Normal floating point code still uses the 387 FPU which natively has 80 bits floating point numbers in a stack-oriented register set. It's exactly the same on ia32 and x86_64.

Note that this is the view presented to software, how much interaction between these units is behind the scenes is only known to the chipmaker.

Also note that Pentium-4s are generally faster than AMDs for floating point, both if you use the FPU and when you use MMX or SSE. AMD didn't make that a priority and it didn't change between Athlon XP and Athlon 64 (except the 64 has SSE at all, but not as fast as Intel's).

Artanis · Nov 30, 2004

Originally posted by: MartinCracauer
Hey, did you listen at all? For the most part 64 is not faster as such.

I listen, but I see there is almost no diference. Of course, Linux tests and beta OS&apl. may not be relevant yet. But I see no much interes for software developers to move toward 64 bits yet...

Vee · Dec 2, 2004

Originally posted by: Artanis
good point, end of story. primarily, 64 bit is not the 'performance fever' as some would believe is twice as speedy (because 64=32*2 ). that's the point...
cya!

I see from later posts that you've either completely misunderstood this!? (or...)

There will be a good performance boost from 64-bit. I tried to explain you the technical reasons for that, which are not so selfevident, and have more to do with the ISA, than 64-bit datachunks.

You are practically guaranteed at least 30%, from what I've seen, even from a crude port. And that is a good boost indeed. But actually, more 'mature' optimizing will probably bring that up to 40-55%. And in some extreme cases, where 64-bit integer ops, twice the number and more useful registers, and mapping tricks, will converge, you will see 400%-500% increase.

- Still! Still the primary point of 64-bit is the 64-bit pointers! Because this is 1000nds times more important then a mere performance boost. The things that will be done on the 64-bit platform, cannot be done at all on a 32-bit platform, at any speed. You might as well try to travel to the moon on your bicycle.

Originally posted by: Grant2
So is it true that 64-bit integer math will be performed much faster on a 64bit cpu? (because it doesn't have to break it down into multiple 32-bit operations)?

Yes. But it will also be faster yet, because we have more visable registers.

Is it also true that 64-bit floating point math will not be performed any faster, because modern CPUs already have specialized 128-bit hardware to handle that?

No, FP will also be faster, because we have more visable registers.
And yes, there already is specialized 64/80-bit hardware (not 128-bit) to handle double precicion FP (64-bit) math.

'128-bit hardware' is actually just 64-bit SSE and 128-bit SSE2 registers, and instruction extensions to include vector operations. I hope you understand what that is. There are still just logical hardware operations for 8, 16, 32, 64 bit long data. And the execution paths are generalized in the A64 to 64-bit. (so there are good reasons to call it a "64-bit" CPU)

An 128-bit instruction operates on vectors. That is, it performs 4 times 32-bit operation, or 2 times 64-bit.

The A64 has basically three execution paths. It also has storage/buffer areas for instructions in work.
The instructions pour into the CPU, into three parallel decoders. Then they collect in a pool, where their sequential order is broken. Each instruction, at this point, have their own version of the contents of 'visable' registers. At this point, a 2X64, 128-bit vector instruction is split up into two 64-bit operations. A sheduler then kicks away the operations, as soon as they are ready (all relevant in-data collected or computed) into either, one of three paralell integer execution units, or one of three specialized FP units.
(If no instruction is ready for sheduling, data is guessed and an operation is kicked into the execution unit anyway, rather than wait. If it eventually proves the guess was wrong, the operation has to be redone, but if right, then the result is already at hand.)
After the execution, the results are collected in an re-ordering queue. There the sequential order is restored, before the results are committed. In this case, the two 64-bit results are written to the visable 128-bit register that is to accumulate the result.

As you see, there is a good deal of parallelism implemented in todays CPUs. Vector ops is one fairly explicit way to put this to good use. My point with this description, is that the real hardware is not exactly the same as what is implied by the ISA, to the assembler programmer. The processing power can be increased by adding more decoder pipes and execution units. Problem is to make use of it. Vector instructions help a bit with that.

uOpt · Dec 2, 2004

Originally posted by: Vee
There will be a good performance boost from 64-bit. I tried to explain you the technical reasons for that, which are not so selfevident, and have more to do with the ISA, than 64-bit datachunks.

You are practically guaranteed at least 30%, from what I've seen, even from a crude port. And that is a good boost indeed. But actually, more 'mature' optimizing will probably bring that up to 40-55%. And in some extreme cases, where 64-bit integer ops, twice the number and more useful registers, and mapping tricks, will converge, you will see 400%-500% increase.

Which programs did you try? I ran a number of things through 32-bit and 64 bit compilation and they basically behave the same, and that is within a few percent. More often than not the 64 bit code is slower because it is bigger, especially if it uses a lot of pointers. E.g. bzip2 got slower.

Only programs with 64 bit integer arthmetic turned out to be substancially faster.

- Still! Still the primary point of 64-bit is the 64-bit pointers! Because this is 1000nds times more important then a mere performance boost.

Very true. But there is no speedup involved, the only performance-relevant aspect is that both code and data with pointers get bigger and hence a little slower.

Originally posted by: Grant2
So is it true that 64-bit integer math will be performed much faster on a 64bit cpu? (because it doesn't have to break it down into multiple 32-bit operations)?

Click to expand...

Yes. But it will also be faster yet, because we have more visable registers.

That doesn't do much speed-wise. Internally all modern CPUs have more regsters than appear in the ABI, they have special storage to do much faster turnaround of registers that a normal memory transfer would have.

As with larger pointers, this is mostly for the developer's benefit, in this case for the compiler writer. The compiler gets simpler when you have more registers.

But it is not much of a speedup per se to switch 8-register code to use 16 registers with less swapping. As is very evident in benchmarks.

Is it also true that 64-bit floating point math will not be performed any faster, because modern CPUs already have specialized 128-bit hardware to handle that?

Click to expand...

No, FP will also be faster, because we have more visable registers.

The FPU has the same number of registers and the registers have the same size. It is exactly the same as in ia32.

And yes, there already is specialized 64/80-bit hardware (not 128-bit) to handle double precicion FP (64-bit) math.

Exactly, and it didn't change between 80386 and Athlon64.

64 bit CPU. A true technical review please

Golden Member

Senior member

Diamond Member

Golden Member

Golden Member

Member

Member

Senior member

Senior member

Junior Member

Member

Senior member

Golden Member

Member

Senior member

Member

Junior Member

Golden Member

Golden Member

Member

Golden Member

Golden Member

Member

Senior member

Golden Member