Then you also should see
they try this kind of new paradigm since X360
but PC not catching up with the paradigm
good article back then from
http://arstechnica.com/features/2005/05/xbox360-1/
so MS need to pushed the entire ecosystem into streaming model
which is funnily is actually why X360...
you should see Mike Mantor paper
there is no coincidence he recently awarded by AMD
http://www.hpcwire.com/off-the-wire/amd-promotes-renowned-graphics-architect-michael-mantor-to-corporate-fellow/
http://ir.amd.com/phoenix.zhtml?c=74093&p=RssLanding&cat=news&id=2118863
he is a father of X360...
X1 using DX 11.x not even latest XDK
too bad games still not use it ...
MS seems must pleased IHV too + some politics
The actual X1 performance DX11.x versus PC DX11, this is from single thread
==================================
And Max Deferred context under DX11 core is way beyond PC or...
you reading wrong Jeff
always downplay abit on X1 news
but add more sauce to the PS4
when there is nill fact of tensilica on PS4 for example
they not abandoned OBAN
OBAN is eSRAM 3D IC
How they can support BC
they have HPAPU-->
If you have 2 SOC the story told in media wil be
1st they said...
this is just only several work resume that hinted ....
the test chip is 3D start with 32nm, the final one is 22nm (fabbed at CPA)
the Jag SOC is 28nm TSMC (later on)
that arm block is for PSP
not related to MCU or DSP, or Tensilica
not all is tensilica based
some FPGA etc
you see the eSRAM has lighter color vs the surrounded
because it is stacked
the reason chipwork not doing cross sectional cut
when they do to pS4
same thing to techinsight doing cross...
From Hotchip
John sell said eSRAM can be accessed from CPU
that from Programmer POV
eSRAM is basically addresable
this is the fast embedded SRAM one the one that has CPU access
==================================================
real immutable in operation one !!
esram = ehanced SRAM with...
because the eSRAM is 3DIC
highspeed eSRAM is immutable
targetted for streaming DX12 concept
front end doing the hard work
then stream
the addresable "eSRAM" is slower one
it is why in XDK they describe 2 things
addresabel eSRAM act like giant scratch ram forced programmer to use...
you have to check the other page
in XDK they said 4 GDS, thats why there is GDS to GDS trf function
so it is fit
plus on XDK they said the extremely high BW esram no CPU connection
but cache like eSRAM has CPU access you can bet where this thing goes
it is funny that Charlie the one that hated...
BTW to Jeff and other this is updated paper from John sell
please dont use Hotchip one
as it is not even tell the true story of X1
this is the preprint version
of John sell with 2nd texture cache goes to CUs
means there is 4 block
also remember texture cache only goes to CUs
the 768 operation...
as days goes By
people forget what PIM is
1st commercialized PIM is eDRAM in X360 that in 2D design
(from outside it is just 20-32GB/sec) from inside is natively Huge BW
X360 has 192 ALU inside the emb RAM = PIM
with big money and multi billion in RD
you can guess the next evolution of eDRAM...
Just for fun this is Mike Ignatowski patent from IBM then Microsoft --> MULTI ISA (PPC/X86 etc)
then this is when AMD CTO congratulate Mike for success of AMD PIM (Fast Forward prj)...
@Jeff of course PIM has to be low it is clocked same as eSRAM clocked
you know why Charlie hinted low clock 426Mhz for long
i know charlie dont like MS so do most people at S|A (i dont know why)
but his hint are correct
first PIM is designed to be multi stacked
the CPU has to be under SRAM or...
Oh BTW AMD Fast forward (processing in memory) PIM architect has Patent for Microsoft for multi ISA
one of PIM Architect is Mike Ignatowski
PIM also clocked low (like my speculative of esram Speed)
PIM also sized in same size as current HBM which is also same as eSRAM
which is 35-40mm2
also...
i will posted this
just to open your mind
this is from XBOX director SOC patent
before reading this i sugest you check X1 XDK
then also check the XDK about move engine block is send vertices and command to gfx core, of course no one try to decrypt those hint.
there is many big hint in XDK...
trinity is mobile APU but not low performance
with your track record u sure know about it
AMD listed HP-APU as datacenter oriented
trinity is part of it
Liano is low performance one
then succeeded with Jaguar etc
also u should know that AMD PIM is designed using trinity as based
AMD...
you have to check the link by yourself
http://vcew.org/CE-Vail-2014-Program.pdf
so Jeff can post a patent, but no proof ?
but that actual keynote is a fake ? LOL
from Trinity Designer & John Sell not a proof?
check the link
"Xbox One Next gen Processor"
that is not fake !!!
we are educated...
MS hold a secret
but also MS has IHV to think.
AMD or MS dont want to call X1 as nextgen game processor in prestige event recently in 2014, if X1 is jaguar
X1 has HP-APU
AMD categorized HP-APU only for datacenter/supercomputing usage
look at this clue
plus why MS can enabling BC from nowhere...
this is another one
Processing in Memory, that also tested about low clock low power
this is maybe more advanced but the idea is still the same
http://www.dongpingzhang.com/wordpress/wp-content/uploads/2013/06/MSPC6-Zhang.pdf
check page 2
reduce the clock (50%)
but
reduce the latency
cpu side...
And to make all happen perfectly
AMD must use same like NVIDA put TLB on CUS (L1) not IOMMU
just like said on MSDN IOMMU vs GPUMMU
make GPU act like a core, this GPU is group of block
It is Why Zen supposed to be Full HSA 1.0, not just HSA as features
this is poster of some GPUMMU research...
As some people that check AMD research will know
plus Layla mah slide and others
AMD will replace the CP function into CPU like core for years
you can check the Journal here
http://research.cs.wisc.edu/multifacet/papers/isca14-channels.pdf
The CP is replaced by aggregator which is simpler...
it is supposed to be clocked 300-600, low power low clock
it is why called zen core
Zen will provide scalar engine
Greenland will provide vector engine
basically greenland will have 2 times CUs as clocked low
and AMD will try to replace many Fixed function with CUs
(make it easy to...
This is huge .....
BC/FC on X1 could be real
Means Xbox One Have another Power Derived ISA somewhere
as we dont see uit from Chipwork , it is probably Under eSRAM
No Wonder AMD Lawsuit to its former employee
listed Power Architecture for Next Gen Xbox (Last time it is called
Xbox 720)...
So tonga use LZ7 compression like being used on Xbox One Custom SOC
interesting
Plus Tonga using High Quality Scaler, which also present on ......
LZ7 compression is practically lossless and not lossy compression
Plus interestingly S|A said Tonga as PI
@seronx remember when you said wait the...
i mean, Tonga is 16 CU with 128 ALU per CU
it fit with it has 256bit memory controller
Yes AMD could still said it as 32CU but 64 ALU
but as compute gives more attention
better to have 16CU with 128 ALU
but with better SIMD distribution
plus this time each 8 wide SIMD have its own LSM
just...
you can guest why 256bit 32 CU
because AMD leaked china slide about excavator
said They are chasing 256bit FMA
infact they said 64 ALU as 8 Ex 256bit FMA
http://diybbs.zol.com.cn/11/11_106489.html
so it means with 16 Execution unit per CU like on micro 46
1 Ex unit = 256bit FMA
16 EX = 16...
My breakdown on original PI GPU map (Wccftech or other site original source)
http://ascii.jp/elem/000/000/874/874779/img.html
With the idea like AMD HSA document, future more focused on Compute.
Modern GPU can use Compute Shader to bypass ROPS
but there is a problem
From GDC 2014, Avalanche Studio
Solution is to bypass ROPS, by using only CS, as more Games are use more and more CS only
The only Problem current GPU only have one render pipe
But suggest it would change in the future...
GP COMpute, or just compute shader dont need frontend at all
what they need only TMU, other than that
is just scheduler to CU to L2/memory controller
for GP Compute = no need for RBE or frontend, reduce the die area almost 50%
basically split the pipe into dual pipe (XTX code name)
so 1536 ALU...
what i thinking is simple:
as Mike Mantor said, GPU is become more and more about compute
then in my idea the 1536 Treasure Island is infact same area as 16 CU/1024 ALU
the idea is for compute they dont need all fixed function or geometry or whatever
so physically the die area of 1536 ALU is...
Ready for DX12 , an XTX
What about 16 Execution Unit x (4ALU 64bit) (Carrizo FMAC 256) ,
compare to GCN1.1 , 1 GCN 2.0 CU = 2x old CU
The Spec at first is like usual 32CU, but the BW is ~ for 64CU
The CPU represent 2 unit of F32
*) the image from micro46 event, pdf available on the site...
1 F32 provided up to 32 thread
2 unit of it is enough for 64 CU
it could also embedded into the CPU itself, anyway that the concept
what is best the GPU can do --> vector/troughput processing
1 Unit means 1 cluster of cores,
Future GPU will use rather than 4 vec16
what about 16 vec4 the catch...
Mike Mantor APU13 slide
ACE transform into new RISC F32 (XTX design)
Future CS
Not sure why the second link was here - removed by request.
second link removed by request, hmmm interesting
anyway thank you with your PM
DX12 or PI or Maxwell Highend
need XTX design for full DX12
NextGen GPU will threat per CU as thread compare to previous Gen
Example on this image
=================
3 CPU core control 3 CU (XTX, one for Gfx, the other for Compute)
Pirate Island will use Compute Cores
it will have RISC F32 or ARM who knows
so in pirate island
1536 ALU could meant , 512 ALU from FMAC256, + 2x 512 ALU (XTX)
Check this image
this is supposed to be for Carrizo, infact the only Next Gen console
resemblance with the philosophy is :D
6 ALU...
Evolution of this ?
http://msdn.microsoft.com/en-us/library/windows/hardware/jj553428(v=vs.85).aspx
And Microsoft Own patent about QOS
http://www.google.com/patents/US20120159090
"Scalable multimedia computer system architecture with qos guarantees"
What about Mike Mantor said that , scalar...
From AMD PDF:
http://developer.amd.com/wordpress/media/2012/12/AMD_Southern_Islands_Instruction_Set_Architecture.pdf
http://developer.amd.com/wordpress/media/2013/07/AMD_Sea_Islands_Instruction_Set_Architecture.pdf
Someone posted this
hmmm.... :eek:
http://videocardz.com/46006/amd-radeon-r9-290x-specifications-confirmed-costs-600
R9 290x Important specs:
- 11 CUs
- 44 SIMDs
- 2816 ALU
hmm this is strange and interesting at the same time
Then i looked someone from other forum posted this:
Splitted into 4 ?
each SIMD now represent...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.