I highly value and respect the reverse engineering Agner does but sometimes his conclusions about performance implications seem a little off to me. For instance he thinks HT often offers little benefit or even slowdown in Atom, when in the real world that never seems to be the case. He's also said weird things about how programs would often hit Bulldozer's restriction about not being able to access two cache banks in the same cycle if they're 256 bytes apart, seems to me that'd almost never be a problem..
I also don't think the 8-bytes/cycle fetch is particularly unbalanced for the rest of the processor, especially when you consider the most important Atoms are x86 only (no REX prefix) and don't have SSE4+, much less AVX. One read or write per cycle is par for the course for this complexity/power consumption class - if you move up to read + write like Bobcat and Cortex-A15 you pay for it. It's nice to have though.
I actually think Atom isn't that bad, not as bad as I originally thought anyway. But what I do think is it needs x86-64, preferably even "x32", since 8 registers really holds it back. Try coding some ASM optimized for it sometime, I think you'll see what I mean. Some other ISA features hurt more for a narrow in-order processor, like having to deal with more moves, but the full read-modify-write pipeline is nice. Agner is right though, software has to be well optimized specifically for Atom. I wonder how much that is the case for software currently executed..
As for all the stuff about how it's a 5 year old core with no changes, Saltwell at least brought about some changes (outside of being ported to 32nm of course). What we don't really know is how much Intel has worked at changes that could reduce power consumption but not necessarily be visible as functional differences. Given the huge advancements they've made here I wouldn't be so sure that it was down to the process improvement alone.
I also don't think the 8-bytes/cycle fetch is particularly unbalanced for the rest of the processor, especially when you consider the most important Atoms are x86 only (no REX prefix) and don't have SSE4+, much less AVX. One read or write per cycle is par for the course for this complexity/power consumption class - if you move up to read + write like Bobcat and Cortex-A15 you pay for it. It's nice to have though.
I actually think Atom isn't that bad, not as bad as I originally thought anyway. But what I do think is it needs x86-64, preferably even "x32", since 8 registers really holds it back. Try coding some ASM optimized for it sometime, I think you'll see what I mean. Some other ISA features hurt more for a narrow in-order processor, like having to deal with more moves, but the full read-modify-write pipeline is nice. Agner is right though, software has to be well optimized specifically for Atom. I wonder how much that is the case for software currently executed..
As for all the stuff about how it's a 5 year old core with no changes, Saltwell at least brought about some changes (outside of being ported to 32nm of course). What we don't really know is how much Intel has worked at changes that could reduce power consumption but not necessarily be visible as functional differences. Given the huge advancements they've made here I wouldn't be so sure that it was down to the process improvement alone.
Last edited: