From the chipsandcheese low level benchmarks, Alchemist has big cache latencies and low effective VRAM bandwidth when the Xe compute units have low utilization.
It seems quite reminiscent of GCN that could only show its strengths when async kept the CUs with higher utilization rates.
They also said SIMD8 is fitting for an iGPU focused design, but has lot of overhead for larger ones like dGPU. Battlemage is SIMD16, which will not only improve compatibility, but performance overall. Someone said they are going SIMD32 in Druid.
Plus you have missing instructions such as Execute Indirect, and Fast Clear which has been in AMD/Nvidia designs for decades but only being used for the first time in Battlemage. Fast Clear for example will improve performance in small data transfers, which slapping on a large memory bandwidth interface cannot do.
Based on what the leakers are saying though, the advancements in Battlemage almost pales in comparison to what's coming in Celestial. So 3rd time's the charm?