The point I'm struggling with is that from the instruction latencies/stages, Zen looks like a high frequency design... higher than BD. P4 was 20 stage and P4E a few stages more.
No it doesn't, it looks like any other middle of the road x86 CPU maybe you need to go look at other CPU's. There is no information right now about how many stages Zen has, only execution time of instructions which is only a small percentage of the total pipeline. Also total number of stages is a maximum, The better you prefetch, predict and recover from mis predict the longer your front end can be while you reduce the length of time waiting on dependent instructions.
But looking at execution latencies it looks no different to any other 3+ghz x86 design. Hell given the fact it doesn't do FMA in a single unit like haswell+ its FP execution latencies are significantly lower then CPU's of recent history.
Many of the published patients were worked on when CON core was the target, which mean its just as likely they would be implemented into a shorter pipeline as they are to be implemented as described.
Personally i think Zen is going to share a lot in common with CON cores in terms of Front end and Load/store. With the execution/scheduling and the cache system seeing the big reworks, given that i expect a pipeline around CON core length. But thats rather irrelevant (unless it was extra short or extra long), wide aggressive OOOE processors live and die on having the right data as close as possible to the execution units as soon as possible.