AMD CEO talks of long-term turnaround

Page 19 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

buletaja

Member
Jul 1, 2013
80
0
66
BTW to Jeff and other this is updated paper from John sell
please dont use Hotchip one
as it is not even tell the true story of X1

this is the preprint version
of John sell with 2nd texture cache goes to CUs
means there is 4 block
also remember texture cache only goes to CUs

the 768 operation people talk about for 1.2-1.3TF
is only for producer part
MS not even talk about the vector part

so yes in this POV MS said the GPU of course 1.2
but basically that is Scalar or producer part of the GPU
not vector part which is basically extension like they dis on X360


this is preprint version of IEEE
you can get the full pdf if you pay
but i have it i will upload it later on
in the mean time the pre print of more tech detail of X1
include 16 VA (GPUMMU)
GCN currently only has 1 VA (include PS4)

www.computer.org/csdl/mags/mi/preprint/06756701.pdf

the actual X1 from updated john sell pdf is like this


compare to hotchip you see they add more detail


now want to know real spec of X1
this is from XDK
===========================


BTW, TCP is part that goes to 2nd texture cache
each TCP correcpondence with 1 CU per XDK (but this part clocked half)

without wddm 2.0 basically this part is off limit
sure DX12 not make X1 magically have more hardware
it is already there
DX12 or W10 or WDDM2.0 is needed to correctly accessing X1 hardware
(for me , it is also because MS dont want to describe it, as what
they done just like hololens is basically bit forward thinking,
so they dont want to explain, why bit by bit they can do
- streaming
- improving performance
-BC
etc

*) from XDK , you notice, the CB DB is a block,
*) also look at the VGT vs PS4 = only 2
*) also SQ = 32, PS4 =18, as SQ normally per CU, but X1 seems detached and put to scalar block (CPU like core)
*) the real deal is why XDK said there is 2 type L2, you see there is 4 GDS, yep, PS4 only has 1 GPU block = 1GDS
 
Last edited:

pTmdfx

Member
Feb 23, 2014
85
0
0
BTW to Jeff and other this is updated paper from John sell
please dont use Hotchip one
as it is not even tell the true story of X1
*snip*
Well, I do wonder if you have over-read the counter struct. The outer dimension of the 2D array seems more likely to be different counters of a particular type, while the inner dimension is the counter data of a particular block. This is more reasonable, and all the data points match what is known about GCN and XB1's GPU in the public.
 
Last edited:

jeff_rigby

Member
Nov 22, 2009
67
0
61
Everything in purple and green is on the arm bus, then you go on to say "is ARM and Cadence IP with most of it Xtensa configurable CPU stream accelerators[/URL] running at 350 Mhz with it's own RAM using uDMA move controllers to move data to and from the 1 Xtensa controller and 32 Cell like SPUs that make up a Xtensa DSP or IVP". So "with its own ram", are you still referring to the 8gb ddr3?
No, stream processors are like hardware codecs, they have their own SMALL memory. See the Xtensa IVP below.



I have a huge issue understanding why exactly you would need 8gb of ram for this arm bus in the first place, especially when you starve the gpu as a result. If you absolutely had to have ddr3 for the "arm block" and there just wasn't any way engineers could make use of gddr5 (for some strange reason they lack competence), why not just just a portion of the ram ddr3...... Why all 8gb, which surely will starve your gpu? Then you bring up power consumption as a reason, which i guess is you throwing many things at the wall until something sticks, but here again why can entire PC gpus with gddr5 downclock to sip merely a few watts yet this can't be achieved by engineers designing the Xbox one? We are supposed to just accept these things?
The PS4 and XB1 were designed to be Connected Home Living room STBs that support: Games, Media, VOIP and Skype/ooVoo, IoT and more. There are multiple power modes needed with some regulated, some will be regulated. Always on Always connected network standby, DVR and IoT modes are 500mw + some optional power. GDDR5 draws too much power for those modes. Sony moved the ARM block out of their APU to the Southbridge ARM SoC using 256MB DDR3 so they could use GDDR5 with the AMD APU.

Also, are you suggesting the DSP processors will be used as dedicated HW for VISC on the Xbox one? Hmmmm
No, they will be used to support Media APIs so a third Xbox PPC will not need to be emulated.

You haven't proven to me anything. If m$ had to use ddr3 for arm blocks for DSP, to dedicate the entire 8gb of system ram just seems dumb to me.
Why didn't Microsoft do the same as the PS4, why does the XB1 ARM block have access to 8 GB DDR3? The ARM block in the XB1 does offload tasks that normally would be done by a CPU or GPU and the XB1 will operate more like a PC with more apps loaded in memory at the same time. Sony has stated the PS4 will have fewer APPs

What you are saying is the entire system was designed around VISC first and the starving gpu just got lucky that esram was also needed for VISC.
I just find that hard hard to believe.
No I am saying that the CPU to ESRAM coherent memory custom Jaguar block is from a 2016 ZEN design which I think supports VISC. As far as I can tell there is no 2016 or 2017 AMD design using ESRAM for the GPU, they move to HBM for that and use ESRAM for ONLY the Zen CPU.

Are you sure that you just didn't read the statement "more CPUs clocked slower" and thought, "hey the apu in the consoles are clocked lower so it's got to be VISC." so since then you have been grasping for anything you can force fit. The Jaguars cores are clocked low and there are a few, but these are x86 cores running at a low speed. The esram, the fact that ddr3 is slow is reason enough to have it. You even admit that, but still in your mind it is part of the plan to one day unlock VISC capabilities for whatever reason. VISC needs super fast cache, so that's why esram is there? That is a stretch to me. You do realize that the Xbox one esram is just slightly faster than the 8gb ps4 gddr5.
Something like that. I read about Visc and thought; that's how they are going to provide single thread performance when they move to more smaller slower CPUs are better than a few larger faster CPUs AND it can provide as a side benefit CPU ISA emulation. What's needed by VISC requires reading the VISC papers.

Lastly, as others have stated, the Xbox 360 CPU is not all that powerful. The AMD CPU might be clocked low but the 8 cores make it more powerful than the xb360. There is no reason why I would find it strange that clever programmers and developers found a way to run some of the xb360 programs on a CPU that is more powerful. You absolutely don't need to dream up VISC as the only way it might be possible. Or at least, I dont
A PPC at 3.2 Ghz on some select code can run that code twice as fast as a Jaguar CPU because the Jaguar is running at 1.6 Ghz. This is the logic behind those that said BC was not coming for the XB1. Is this suddenly not true or is there something VISC like that overcomes this?
 
Last edited:

NTMBK

Lifer
Nov 14, 2011
10,269
5,134
136
A PPC at 3.2 Ghz on some select code can run that code twice as fast as a Jaguar CPU because the Jaguar is running at 1.6 Ghz. This is the logic behind those that said BC was not coming for the XB1. Is this suddenly not true or is there something VISC like that overcomes this?

Depends on what the throughput and latency of each instruction is. An instruction which takes 2 cycles on a 1.6GHz CPU will be the same speed as one which takes 4 cycles on a 3.2GHz CPU. (But I don't know either Jaguar or Xenos well enough to comment.)
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Depends on what the throughput and latency of each instruction is. An instruction which takes 2 cycles on a 1.6GHz CPU will be the same speed as one which takes 4 cycles on a 3.2GHz CPU. (But I don't know either Jaguar or Xenos well enough to comment.)
With some latency data we might do the comparison. Well scheduled PPC code might achieve good IPC at times. But then there is the second thread, which - while increasing throughput - will regularly block execution ressources for the first thread, lowering the individual throughput of both.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Depends on what the throughput and latency of each instruction is. An instruction which takes 2 cycles on a 1.6GHz CPU will be the same speed as one which takes 4 cycles on a 3.2GHz CPU. (But I don't know either Jaguar or Xenos well enough to comment.)

The nominal latency and throughput of each instruction in isolation only tells you part of the performance story. Xenos/Cell PPE has numerous glass jaws when compared to Jaguar. Like a really weak branch predictor, really expensive misprediction penalty, really expensive fetch penalty even for correctly predicted taken branches, bad L1 cache/L2 hit latency and really bad main memory latency, huge load-hit-stall penalty and other similar stalls that cause a round trip in the pipeline, no hardware automatic prefetch whatsoever, 128B cache lines, relatively low L2 cache bandwidth with writethrough L1 dcache...

Then the basics, yes, it has higher latencies (2 vs 1 cycle for simple ALU ops, 5 vs AFAIK 4 cycle load latency), can't co-issue ALU ops, can't co-issue load with store, doesn't have any OoOE to hide latencies. All of this is going to lend to much lower typical perf/MHz.
 
Last edited:

ocre

Golden Member
Dec 26, 2008
1,594
7
81
No, stream processors are like hardware codecs, they have their own SMALL memory. See the Xtensa IVP below.



The PS4 and XB1 were designed to be Connected Home Living room STBs that support: Games, Media, VOIP and Skype/ooVoo, IoT and more. There are multiple power modes needed with some regulated, some will be regulated. Always on Always connected network standby, DVR and IoT modes are 500mw + some optional power. GDDR5 draws too much power for those modes. Sony moved the ARM block out of their APU to the Southbridge ARM SoC using 256MB DDR3 so they could use GDDR5 with the AMD APU.

No, they will be used to support Media APIs so a third Xbox PPC will not need to be emulated.

Why didn't Microsoft do the same as the PS4, why does the XB1 ARM block have access to 8 GB DDR3? The ARM block in the XB1 does offload tasks that normally would be done by a CPU or GPU and the XB1 will operate more like a PC with more apps loaded in memory at the same time. Sony has stated the PS4 will have fewer APPs

No I am saying that the CPU to ESRAM coherent memory custom Jaguar block is from a 2016 ZEN design which I think supports VISC. As far as I can tell there is no 2016 or 2017 AMD design using ESRAM for the GPU, they move to HBM for that and use ESRAM for ONLY the Zen CPU.

Something like that. I read about Visc and thought; that's how they are going to provide single thread performance when they move to more smaller slower CPUs are better than a few larger faster CPUs AND it can provide as a side benefit CPU ISA emulation. What's needed by VISC requires reading the VISC papers.

A PPC at 3.2 Ghz on some select code can run that code twice as fast as a Jaguar CPU because the Jaguar is running at 1.6 Ghz. This is the logic behind those that said BC was not coming for the XB1. Is this suddenly not true or is there something VISC like that overcomes this?

I am not buying that xbox developers just had to have a massive 8gb worth of DDR3 because of these arm blocks. There is simply no need for that much ram for DSP and such. M$ claims that they went DDR3 because of its power saving capabilities, I wouldnt expect them to say they went as cheap as possible. AMD has had ZeroCore technology for ultra low power states in long idle periods for GPUs with Gddr5 and Nvidia has entire GPUs short idle as low as 6watts. It still may well be true that the DDR3 can have lower consumption that is useful in low power states, but there is no way this can be that large. Maybe you havent seen the xbox one vs PS4 power consumption comparisons but the xbox one uses 2 times what the PS4 does in standby. The xbox one has no advantage when they are both in their off state. http://www.extremetech.com/gaming/1...power-consumption-inefficiencies-still-abound

So, no only is the PS4 gpu loaded with plenty of speedy Gddr5 ram, they also have a system that is much more efficient in standby. Why did M$ build a system with 8gb of slow ddr3? It being cheaper still seems a candidate to me.
I guess it is true that the PS4 uses more power in pretty much everything else it does: from watching videos, gaming, to navigating menus. But it is not 2x as much power like the xbone over the ps4 at idle. We also cant forget that the ps4 has a 50% bigger GPU, so it is not too surprising it would use a few more watts.

You bring up the zen CPU having esram and therefor you believe it is a VISC design. Are you the one that insist esram = VISC, cause i am not sure that it means VISC at all. I havent even seen this anywhere else. Esram is ram, its not the same as on chip cache.
If zen has Esram, i dont know but it doesnt mean it will be a VISC architecture.

VISC is its own CPU architecture, the APUs in the consoles are x86 architectures. Visc CPUs are supposed to be able to run ARM code or x86 code but the last i heard, there was a performs hit when emulating these instructions sets with its virtual cores. Something like 10%.
The idea of VISC is great, but this is a real architecture thats entire purpose is to be able act as virtual cores. It was built for this purpose.

An x86 CPU is specific HW design to execute x86 specific code. Last i checked, the xbox one runs x86 code. It also already has an alleged ARM CPU in it. Where is the need to have VISC emulating virtual x86 cores? Why have a real arm CPU, just use ARM virtual cores with the xbox one VISC capability. Or better yet, virtual x86 cores......for what ever reason.

See, the beauty in a console is that it is static HW. This exact same CPU, HW, and arrangements across every system. Spreading work across to utilizing all cores is not a the issue it is in other platforms. On fixed hardware, this is perfectly plausible. The fixed environment is perfect for allowing programmers to spread work across the cores with great efficiency. You could saturate loads across every core, divide up the work and know exactly the way it will come together.

The x86 jaguar CPU cores can be utilized to the max because it is in a fixed environment. You just arent gonna get any more performance that its max theoretical performance. And with a console, it is possible to spread the load out as much as you need to, until you have every core completely saturated.

VISC makes no sense to me here. You wouldnt make virtual cores to emulate and run ARM stuff when just reprogramming whatever ARM code you want to run on your fixed x86 Jaguar CPU would be more efficient and capable. I have no idea why you think VISC would be that useful on the Xbone anyway. Its not gonna make the x86 CPU any more capable than it already is. Its theoretical performance is all that it is capable of.

So, i even checked out the backwards capability you keep bringing up. Have you seen the list of games that are currently allowing for testing? They have them up on the Xbone website. Have you seen them? They are all very simple and basic games, most could be ran on a cell phone. These are very basic and simplistic titles.
Its not shocking at all to see them ported.

You seem to think that VISC is something it is not. Programmers can spread loads across multiple cores without VISC. We have been doing it for years. The fixed HW on the xbone will allow programmers to take advantage of the jaguar cores in ways that can only be dreamed of on PC. They can and will b able to spread the load, it doesnt take VISC at all.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
The nominal latency and throughput of each instruction in isolation only tells you part of the performance story. Xenos/Cell PPE has numerous glass jaws when compared to Jaguar. Like a really weak branch predictor, really expensive misprediction penalty, really expensive fetch penalty even for correctly predicted taken branches, bad L1 cache/L2 hit latency and really bad main memory latency, huge load-hit-stall penalty and other similar stalls that cause a round trip in the pipeline, no hardware automatic prefetch whatsoever, 128B cache lines, relatively low L2 cache bandwidth with writethrough L1 dcache...

Then the basics, yes, it has higher latencies (2 vs 1 cycle for simple ALU ops, 5 vs AFAIK 4 cycle load latency), can't co-issue ALU ops, can't co-issue load with store, doesn't have any OoOE to hide latencies. All of this is going to lend to much lower typical perf/MHz.

Don't forget Cell is all in-order. XBox One/PS4 are out of order. But despite these shortcoming, I'm amazed at how good games like The Last of Us look.
 

buletaja

Member
Jul 1, 2013
80
0
66
Well, I do wonder if you have over-read the counter struct. The outer dimension of the 2D array seems more likely to be different counters of a particular type, while the inner dimension is the counter data of a particular block. This is more reasonable, and all the data points match what is known about GCN and XB1's GPU in the public.

you have to check the other page
in XDK they said 4 GDS, thats why there is GDS to GDS trf function
so it is fit

plus on XDK they said the extremely high BW esram no CPU connection
but cache like eSRAM has CPU access you can bet where this thing goes

it is funny that Charlie the one that hated MS said Audio block more than just a bleep
because it is a clue for SPU structures
each smallest compute block in X1 design has this:
-branch
-scalar ALU (doing flops too)
-vector ALU
-mmu
(a mimic from IBM SPE design)
check the audio block and hint why abeled as 64KB SRAM,
why not named 64KB Scratchpad, why has to be labeled as SRAM

Audio Block (scalar+vector+branch/control+mmu+ 64KB eSRAM)
we dont know how many block this image represent but from the Gigaflops
we can estimated the ALUs makeup as 8 ALU at <500 Mhz clock
===================================================







plus MS said in xbox engineed interview
they index more

of course they also showed in XDK they has 6 CU group
which each gorup can be disabled running one of shader tipe
sure a shader like PS or VS need more than 1 CU
it is why they called a CU group not just CU

of course 6 CU group is not 6 CU dont they ?

plus it is match with john sell said in the pdf

why a 12CU need a 16 Virtual address
it is the dumbest things to do

when GPU with IOMMU like PS4 only has 1 VA address
 
Last edited:
Dec 30, 2004
12,554
2
76
man, this is pretty intense discussion

I used to eat this stuff up, but the software world has gotten a hold of me and I care less.
 
Last edited:

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Moody's lowers ratings of Advanced Micro Devices (CFR to Caa1); senior unsecured to Caa2; outlook negative

Ratings downgraded:

Corporate family rating to Caa1 from B3

Probability of default rating to Caa1-PD from B3-PD

$600 million senior unsecured notes due 2019 to Caa2 (LGD4) from Caa1 (LGD4)

$450 million senior unsecured notes due 2020 to Caa2 (LGD4) from Caa1 (LGD4)

$475 million senior unsecured notes due 2022 to Caa2 (LGD4) from Caa1 (LGD4)

$500 million senior unsecured notes due 2024 to Caa2 (LGD4) from Caa1 (LGD4)

Speculative grade liquidity rating to SGL-3 from SGL-2

The negative outlook considers the execution challenges facing AMD, the likelihood of ongoing losses, and, while currently adequate, prospects for a weakening liquidity profile, although there are no debt maturities until 2019.

https://www.moodys.com/research/Moo...9SYXRpbmcgTmV3c19BbGxfRW5n~20150728_PR_330766
 

dark zero

Platinum Member
Jun 2, 2015
2,655
138
106
Pretty much RIP AMD and the console market. At least Microsoft is leaving that market returning to the Pc one.

Nintendo console will end into a flop since AMD near bankrupt and the future lack of support and Sony must rethink if going full NVIDIA this time (going Intel x86 pretty much kills their purpouse, unless they quits the console market) or going Power Pc again.
 

NTMBK

Lifer
Nov 14, 2011
10,269
5,134
136
The nominal latency and throughput of each instruction in isolation only tells you part of the performance story. Xenos/Cell PPE has numerous glass jaws when compared to Jaguar. Like a really weak branch predictor, really expensive misprediction penalty, really expensive fetch penalty even for correctly predicted taken branches, bad L1 cache/L2 hit latency and really bad main memory latency, huge load-hit-stall penalty and other similar stalls that cause a round trip in the pipeline, no hardware automatic prefetch whatsoever, 128B cache lines, relatively low L2 cache bandwidth with writethrough L1 dcache...

Then the basics, yes, it has higher latencies (2 vs 1 cycle for simple ALU ops, 5 vs AFAIK 4 cycle load latency), can't co-issue ALU ops, can't co-issue load with store, doesn't have any OoOE to hide latencies. All of this is going to lend to much lower typical perf/MHz.

Oh, definitely agreed. I was considering best-case stuff, hand tuned code which will get maximum throughput from the 360 CPU (and hence be toughest for the Jaguar to emulate).
 

dark zero

Platinum Member
Jun 2, 2015
2,655
138
106
Oh, definitely agreed. I was considering best-case stuff, hand tuned code which will get maximum throughput from the 360 CPU (and hence be toughest for the Jaguar to emulate).
So, the XB1 was dessigned since the beginning to fully emulate the Xbox 360 Cpu? Very nice move from Microsoft then.
 

dark zero

Platinum Member
Jun 2, 2015
2,655
138
106
Stock is making a nice jump today finally, % wise at least. Are there new rumors of a takeover, or maybe windows 10 is giving them a boost?
Windows 10 hype. Once it ends or W10 ends to be somewhat a flop (very unlikely to happen), stocks will collapse on their real origins.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Don't forget Cell is all in-order. XBox One/PS4 are out of order. But despite these shortcoming, I'm amazed at how good games like The Last of Us look.

Did not forget, it's the last thing I mentioned. It's just that being in-order is usually the first and often only thing people say when they talk about this CPU, when it's only one of many weaknesses.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Did not forget, it's the last thing I mentioned. It's just that being in-order is usually the first and often only thing people say when they talk about this CPU, when it's only one of many weaknesses.
I my case this is because I never followed them and just remember, what I read a while back. But there are nice articles on them and even a book available on google books to read the story behind designing the chip.
 

ocre

Golden Member
Dec 26, 2008
1,594
7
81
Oh, definitely agreed. I was considering best-case stuff, hand tuned code which will get maximum throughput from the 360 CPU (and hence be toughest for the Jaguar to emulate).

I think there will be a lot of issues running games that really push the 360. Depending on how deeply "hand tuned", the task becomes ever more daunting.
But, ultimately the xbone CPU has more crunching power to harness and game engines spread across multiple titles.

So, the XB1 was dessigned since the beginning to fully emulate the Xbox 360 Cpu? Very nice move from Microsoft then.

Appears that there could be backwards capability thoughts in the design. this could be one of the reasons they went with Esram. *Could be

I am willing to bet that there will never be full backwards capability though. I feel very sure in that. So "fully emulate", i would hesitate. Its taking this long to get a handful of games that are currently "testing" if you sign up for it. To me, it is absolute proof that they are putting effort in making these games work individually on the Xbone. That means its not easily "emulating" and running 360 code. It is taking effort on a per game basis, if this was "fully" emulating then you could just pop in any 360 game and play. That is not the case. After all this time, its still not fully emulating.

When i look at the xb360, i sure dont think its all that strange that M$ engineers used Esram in the One. It could have been there even if it wasnt a part of Backwards capability. The system is hard to imagine without esram, just fully dependent on ddr3 it would be data starved worse than it is. They could have found it useful for other reasons too. Engineers could have had many other reasons. Stuff we arent talking about.
The xbox one has plenty of hidden stuff and a supposed full capable arm CPU which i can only imagine what it is being used for. Considering the Spying Eye controversy, this arm CPU could have been doing all sorts of stuff when the system was in standby. M$ did have some crazy ideas when they built the system.

See, having mysterious components get peoples imagination flowing. Some people imagine VISC but there is the very real fact that the HW is shrouded in secrecy. It is also true that the NSA and other agencies have used gaming consoles as spy and crime busters. The kinnect feature was planned to be used in some interesting ways, not related to gaming. We know this for a fact. So, the Xbone HW has some weird things going on for who knows why. The secrecy could be just to protect their IP but it doesnt take long for imaginations to go wild.

We know Sony talked about backwards capability and have a streaming service. M$ might very well be putting effort in backwards capability because they feel it will give them a competitive advantage. This effort may be totally an afterthought. I certainly dont think it was at the front of all the design decisions. That it played the leading role in its design. That seems very unlikely.

VISC also seems way out there to me. It makes so little sense. Games can be ported and emulated without VISC, we have done this for years without VISC. The APU in the xbox one is more powerful and a static deign. With enough time and if the effort is worthwhile, it seems completely plausible to have 360 games run on the XBone. Since M$ is struggling in sales and the 360 was so popular, it makes sense that they would be looking at such an outlet for a competitive advantage. The fact that there are only a limited number of simplistic and non impressive games available for "testing" on the xbone website, that is telling in itself. Its not full emulation but they are working to bring 360 titles to the xbone. It is just a matter of time, money, and effort.
 
Last edited:

jeff_rigby

Member
Nov 22, 2009
67
0
61
I think there will be a lot of issues running games that really push the 360. Depending on how deeply "hand tuned", the task becomes ever more daunting.
But, ultimately the xbone CPU has more crunching power to harness and game engines spread across multiple titles.

Appears that there could be backwards capability thoughts in the design. this could be one of the reasons they went with Esram. *Could be
I think the reason for ESRAM is because they chose to have the ARM block inside the APU and had to use DDR3. The choice to add the Coherent Jaguar to ESRAM special block is likely for BC.

VISC also seems way out there to me. It makes so little sense. Games can be ported and emulated without VISC, we have done this for years without VISC.
Yes on Large higher clocked desktop CPUs not small mobile or many smaller lower clocked CPUs are more efficient than several monster CPUs designs.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
I think the reason for ESRAM is because they chose to have the ARM block inside the APU and had to use DDR3. The choice to add the Coherent Jaguar to ESRAM special block is likely for BC.

Yes on Large higher clocked desktop CPUs not small mobile or many smaller lower clocked CPUs are more efficient than several monster CPUs designs.
Ok, I reenter this discussion.

In this interview the ESRAM decision is explained by costs and the requirement to be able to do GPGPU stuff.
http://www.eurogamer.net/articles/digitalfoundry-the-complete-xbox-one-interview

But back to one of my uarch arguments: If there would be VISC to enable some "more efficient" emulation due to more ST performance, there would still be the need to have enough architectural registers of all types (GPR, FP, SIMD on PPC) or the option to switch between OoO and SMT+in-Order (makes the PRF reusable like in the MorphCore uarch) to be able to map the ISA requirements to the existing hardware.

But if there are such changes, we'd likely see them on the die photos when comparing the XBox One Jaguar cores with other Jaguar cores.
 

buletaja

Member
Jul 1, 2013
80
0
66
Ok, I reenter this discussion.

In this interview the ESRAM decision is explained by costs and the requirement to be able to do GPGPU stuff.
http://www.eurogamer.net/articles/digitalfoundry-the-complete-xbox-one-interview

But back to one of my uarch arguments: If there would be VISC to enable some "more efficient" emulation due to more ST performance, there would still be the need to have enough architectural registers of all types (GPR, FP, SIMD on PPC) or the option to switch between OoO and SMT+in-Order (makes the PRF reusable like in the MorphCore uarch) to be able to map the ISA requirements to the existing hardware.

But if there are such changes, we'd likely see them on the die photos when comparing the XBox One Jaguar cores with other Jaguar cores.

because the eSRAM is 3DIC

highspeed eSRAM is immutable
targetted for streaming DX12 concept

front end doing the hard work
then stream

the addresable "eSRAM" is slower one
it is why in XDK they describe 2 things

addresabel eSRAM act like giant scratch ram forced programmer to use it
in the background move engine will prefetch the data to 3D IC eSRAM

do you think MS got low yield with oban rumor becaus eSRAM only
that the dumbest thing

MS got low yield because eSRAM is 3D IC and new node

this eSRAM slide showed that the xtremely high BW one
is immutable (no CPU access when start residency)
D3D12 describe immutabe as super high speed access and low latency



also remember X1 withs its GPUMMU the only system currently has GPUMMU
also has 16 VA




Plus recent D3D12 video/slide surfaced
about 4 core cluster + 12 GPU core
core=engine=SC
of course only forum said SC=CU
when even AMD patent SC= engine = APD = core
it is hold > 4 CU

AMD Shader engine = Core, Mike Mantor



D3D12 newsest slide 12 SC + 4 CPU cluster
of course someone that doestn;t like MS or X1
will said 12 = 12 CU LOL, look at 3 Copy engine
1 GPU normally only has 1 Block Copy engine
===================================



and we know PS4 from their PDF descibe as 2 SC
but per SC = 9 CU, 2 SC or 2 GPU core
but from Host or Jag POV pS4 is 1 core as
it is only serve by IOMMU so only 1 Virtual address space
=========================================


Good Times ahead
 

buletaja

Member
Jul 1, 2013
80
0
66
From Hotchip
John sell said eSRAM can be accessed from CPU
that from Programmer POV
eSRAM is basically addresable

this is the fast embedded SRAM one the one that has CPU access
==================================================



real immutable in operation one !!
esram = ehanced SRAM with xtremely high BW
200 GB/s is not even high BW remember
again Now from XDK
===========================


remember X1 pack so many tech just like Hololens
it is why to this date there is still NDA
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |