The main factor for the cores having a shared FPU is area-related. Bulldozer's design lineage began in 1998.
First design, low-power, had homogeneous clusters (2 Int+FP) - 1998
Second design, low-power, had heterogeneous clusters (1Int + 1 FP) - 1999-2002 [effectively became Bobcat]
Third design, high-performance, had homogeneous integer clusters with a single FPU (2 Int + 1 FP) - 2002-2004
Fourth design, low-power, similar to above but is built for compute density and efficiency, SMT-like behavior - 2005-2007 (Cluster-based Multithreading, Multi-threaded Compute Core)
View attachment 111707
~~~
Fifth design, high-performance, adds additional resources at various stages to allow the combinable clusters to be uncombinable cores, CMP-like behavior - 2007-2010 (Chip-level Multithreading, Dual-core Compute Module) [this one is closely related to Andy Glew's K10, not Charles R. Moore's Bulldozer]
The first fusion product was Swift. Which had two Stars Gen3 cores. Which was followed up by quad-core Llano which actually released.
AMD's design teams are not smart enough to conspire a weak FPU to sell Fusion/APUs/HSA or anything like it. Bulldozer and Fusion was not conceived together.