- Jul 27, 2020
- 20,902
- 14,489
- 146
First Gracemont laptop available for sale.
Anybody got disposable $500 to buy and test this laptop?
Ninty always has been able to mine gold from of out of date hardware.I was thinking of the lovely first party Nintendo Switch games when I wrote that
Don't you mean 3x3x3 and 2x4x4?3x3 = 27
2x4 = 32
I strongly suspect that at those kind of numbers there would be more overhead involved that significantly increases the area to less than favorable levels.Looking further:
20-wide traditional = 49.12
12x 3-wide cluster = 50.05
No, I'm saying a 3x3 decoder is going to be 27 and 2x4 decoders are going to be 32 if it was exponential but unlikely.Don't you mean 3x3x3 and 2x4x4?
(or 3³ and 2x4² if you want to be more efficient in expression)
It probably is but the problem happens on the traditional decoder setup too.I strongly suspect that at those kind of numbers there would be more overhead involved that significantly increases the area to less than favorable levels.
I think maybe this is something that may be unique/more beneficial to Clustered Decode architectures, because it needs to figure out further ahead.Skymont’s address generation setup is peculiar because there are four AGUs for store address generation even though the data cache only has enough write bandwidth to service two stores per cycle. Again this feels unbalanced, but having more store AGUs lets Skymont figure out where stores are going sooner. That helps the core figure out whether a load needs to get data from cache or a prior store. Of course Skymont will try to predict whether loads are independent of prior in-flight stores, but figuring out the true result faster helps correct a incorrect prediction before too much work gets wasted.
So you could think the E core architectures have a mindset of breaking down units as simple as possible, be more dedicated and then having many, many of them to be parallelized.Intel's actual approach is way more clever; They run the branch predictor ahead of the decoders by at least 3 branches (probably more). The branch predictor can spit out a new prediction every cycle, and it just plops them on a queue.
Each of the three decoders pops a branch prediction off the queue and starts decoding there. At any time, all three decoders will each be decoding a different basic block. A basic block that the branch predictor has predicted that the program counter is about to flow through. The three decoders are leap frogging each other. The decoding of each basic block is limited to a throughput of three instructions per cycle, but Skymont is decoding three basic blocks in parallel.
The decoded uops get pushed onto three independent queues, and the re-namer/dispatcher merges these three queues back together in original program order before dispatching to the backend. Each decoder can only push three uops per cycle onto its queue, but the re-namer/dispatcher can pull them off a single queue at the rate of 9 uops per cycle. The other two queues will continue to fill up while one queue is being drained.
The branch prediction result will always land on an instruction boundary, so this design allows the three decoders to combine their efforts and maintain a throughput of 9 uops per cycle, as long as the code is branchy enough. It works on loops too, as far as I'm aware, intel doesn't even have a loop stream buffer on this design; The three decoders will be decoding the exact same instructions in parallel for loop bodies.
But Intel have a neat trick to make this work even on code without branches or loops. The branch predictor actually inserts fake branches into the middle of long basic blocks. The branch predictor isn't actually checking an address to see if it has a branch. Instead it predicts the gap between branches, and they simply have a limit for the size of those gaps. Looks like that limit for Skymont is 64 bytes (was previously 32 bytes for Crestmont)
Wow, that really seems to blow away the Raspberry Pi 5. If they can make any for the price they claim.Radxa launched the X4 SBC, a RPI sized SBC with the Intel N100 at RPI5 prices.
It has a 2.5gbe nic and x4 3.0 M.2 slot.
This might be a paper launch because it is out of stock everywhere.
How much do those mini PCs cost?
Oh huh, never heard of this Amston Lake before.Radxa launched the X4 SBC, a RPI sized SBC with the Intel N100 at RPI5 prices.
It has a 2.5gbe nic and x4 3.0 M.2 slot.
This might be a paper launch because it is out of stock everywhere.
No, this is Atom 2024Basically ADL-N
They would need to tape out an entirely new compute chiplet for that.Seriously, Intel should release 4P+24E SKU for Arrow Lake NOW!