A hypothetical task that consumes all 16 threads in a chiplet and fills the entire 32MB L3 with shared data will benefit extremely, even more than the prior example.
The real answer is: depends
There are a lot of moving parts when it comes to shared L3 caches, for example:
Lets assume AMD sticks with eviction cache, two cache domains, each 16MB might have advantages over 32MB unified:
1) in cumulative bandwidth, moving to shared by all cores might reduce total bandwidth available to cores both "directly" by having less ports and indirectly by moving from crossbar to whatever they will use now.
2) The chance of "way/address" conflicts increases, even if certain L3 cache slice is "larger", but there might be ways and address conflicts coming from multiple cores, such chance was cut in half by two domains
3) While eviction cache somewhat mitigates it, cores still might fight for cache, like 6 cores working on some read only structure that overflows L2, getting hurt by two cores that stream to memory. It is not as bad as on client Intel where L2 is inclusive, so pretty much L3 see all , but it takes algorithms and policies to stop those two cores from trashing performance.
These problems are "artificial", but if MT load is sensitive to L3 size, they will happen for some workloads.