As far that I know, ECC by itself isn't slower than non-ECC. Buffered IS slower than Unbuffered, since the buffer was effectively like an extra CAS cycle, if memory serves me correctly (That info is what I recall back from DDR1 days, not sure if it remains the same). Buffered vs Unbuffered performance was like... 2 or 3% performance on a worst case scenario. That is pretty much nothing. Benchmarks were done back when the A64 FX-53 was relaunched in Socket 939, since previously there were the Socket 940 FX-51 and FX-53 using the Opterons platform, and they required Buffered DDR to POST.
Also, since Buffered put less electrical load on the Memory Controller, you can run more modules/ranks maintaining the same Frequency. With Unbuffered, you usually have to lower their speed if you fill everything (Say, dropping from 1600 to 1333 MHz after X amount of modules/ranks). At that point, Buffered becomes faster because of the Frequency difference, no questions asked.
Buffered+ECC RAM can still
overclock, just that very few Server platforms will allow you to do that (That is a Supermicro Sandy Bridge-E era Motherboard with overclocking capabilities, which SuperMicro calls HyperSpeed on validated Servers). Based on what usually happens when enthusiast tries to overclock Server parts like S939 Opterons or Westmere based Xeons, chances are that they are higher quality bins than consumer gaming/enthusiast parts and had a lot of headroom.
DDR2 FBDIMMs DID require heatsinks. But they were extremely power hungry. Fully Buffered DIMMs tech wasn't reused for DDR3/DDR4.