Just an update... Take this with a bit of salt.
October 2016 is the date when the thing was conceived. With 22FDX soon and 12FDX in the works. So, where Zen is mostly a high performance core. The thing will replace the low power cores Bobcat/Jaguar/Puma(+)/Excavator(+).
So, if you read AMD's slides Ryzen is Premium and anything else is Mainstream. Mainstream is thus going to consolidate the architectures of the previous gen cores into one next-gen core.
Highly optimized ULP core. Next-gen Bulldozer skeleton with Next-gen Jaguar design. It is optimized to operate at ultra low power to essential high performance.
Very much speculation:
SMT4 NN Branch Predictor -> 2x 4-wide Fetch-Decode (2 cores per FET/DEC) -> 4x Dispatch(Trace/Op/L0i cache)
4x Retire/Rename(2x FP Retire/Rename) -> 4x Cores(2 ALU/2 AGU for every core) + 2x FPU(2x FMAC+1 MMX per dual-core)
2x L0d(per core) -> 1x L1d(shared between two cores) =(Module)> 2x L1d(per dual-core cluster) -> 1x L2(shared per module) =(CPU Complex)> 2x L2(per module) -> 1x L3(shared between CPU Complex).
Power and density is priority, with speed being free. (Initially, Stoney Ridge clock rate + two more cores)
My guesses on the speculation of caches are:
L0d = 4 KB, 2-cycle. (5KB if 1 KB is Stack)
L0i = 1024 ops
L1d = 32 KB, 6-cycle.
L1i = 128 KB
L2 = 1 MB, 18-cycle.
L3 = 4 MB, sub-54 cycles.
(Caches can either be write-through or write-back, extra cycle for selection. Depending on access type, etc. (There is also special hybrid mode; 2 MB L2/2 MB L3 (L3 gives 1 MB to L2, this makes it go from mostly-inclusive to mostly-exclusive.))
Also, most defining rule is that it is MINIMAL design change over Excavator-Zen. In large part is that the mainstream team @ AMD is extremely low budget currently.
FX-series, A-series, E-series, Opteron-series, etc will be using this core for next-gen SKUs. With low-end(4C/4T) first , and high-end(64C/64T) last. Everything is budget. (Zen-lite became CMT?!)