Well yeah several test have proven that the theory that 8x Gen3 - which delivers nearly same bandwith than 16x Gen2 - show a minimally performance drop of <1% on actual enthusiast GPU-generations in SLI/Crossfire compared to 16x Gen3... so we might see this drop increase in future generations (i.e. enthusiast-maxwell).
The deal is than anything from Triple-SLI onwards would need an PLX chip on <=Z97 and while the cards are interconnected with SLI-bridges to allow CPU-unbound transfers via PCIe (or via the PLX chip), the PLX chip on the other hand allows multi-cast transfers up to quadrupling input-signal in terms of bandwith.
So wether or not the CPU-PCIe uplink for filling up the VRAM of the SLI/Crossfire is limited to 14x, 12x or even 8x because of an M.2 Ultra, wether or not its using 4x PCIe Gen2 or Gen3... even at 8x that still represents nearly the bandwith of 16x Gen2 (~7,9Gb/s) uplink to the SLI and in fact is probably higher (12x or 14x depending on Gen of the M.2 Ultra). The PXE multi-casts that to an output bandwith of nearly ~31,5GB/s overall to all cards (~7,9Gb/s each), while the secondary (and third and forth) GPU cards of the SLI/Crossfire-array still can send frames to the primary card where you should connect the display.
The CPU filling up the VRAM is only a bottleneck at all in nearly no game... because it happens during loading-times. Usually the amount of dynamic cahing is so low, that even PCIe 4x Gen2 could satisfy it... you can easily proof that in single-GPU PCIe performance-tests.
The thing consuming most PCIe bandwith (next to mining) in part of gaming during the game itself is frame-transferiing to the primary cards framebuffer.... which have the opportunity to run CPU unbound when connected to a PLX.
It is also true that these "framebuffer-transfers" increase with resolution, bitdepth and higher framerate but also true that depending on your "framerate-target" (like 60, 100,120,144++) the transferrate each cards needs to deliver prerendered frames to the primary card decreases per number of GPUS. To give you an example... while in DualSLI the secondary cards need to send 72fps to the primary card to reach 144fps... this drops down to 36 frames is Quad-SLI.... this means that the need for PCIe-bandwith in terms of deliving frames to the framebuffer decreases on same frame-target when you add another GPU.... so more GPUs = less bandwith needed per card for the same goal!
That means if it really occurs that your SLI "practically" is low on PCIe bandwith to reach your target-framerate... simply ADD another card!
So while the performance drop might be considered 1 or 2%, this will also apply compared to X79/X99 even at Quad-SLI which runs on the Extreme9 @ 8,8,8,8 perfectly with an 8x upstream to the PLX. Also i think the X79/X99 cannot go higher than 8,8,8,8 (32xx) without multiple PLX chips and then also sends 4 times the same bits on each of Quad-SLIs 8x lanes.
Keep in mind that on a 8x/8x-SLI without a PLX-Chip... it consumes 16 lanes of the CPU basically sending exactly same bits on each 8 lanes... but using a PLX-chip does the same transfer using only 8x CPU-lanes while adding lag of about 100nanoseconds.... and also keep in mind, that there is therefore no advantage in having 32 lanes available without a PLX-chip! SO you would need Haswell-E WITH at least one PLX to achieve 16x effective uplink for anything higher than Dual-SLI and then still be stuck on 16,8.8 in triple or 8,8,8,8 in quad on the end of the tunnel if you not have another PLX-chip to enable configurations like 16,16,16,16... (64 lanes).
I guess those boards are XXL and are ultra-high-priced...for what? 1% advantage in any actual generation SLIs... maybe 3 or 4% in next gen??
I dont think it's worth it.
The math (although its technically impossible to go 4K@144hz) says sending 36 frames (i.e. quad-sli @144hz) each 4k-pixelsize frames @ 32 bit consumes a send-bandwith of about 1,3 Gb/s per non-primary card and 3,9 Gb/s receive bandwith of primary card.... that means even PCIe 8x Gen2 would be practically enough, although 8x Gen 3 would better since the primary card needs to receive VRAM transfers from game-data the CPU pushes in from RAM sometimes too.
This brings us to the point that 16x Gen3 uplink still is considered total "overkill" for actual GPU-gen, because no matter if its Dual, Triple or Quad an PLX delivering 8x Gen3 uplink turns out 99% of performance in any possible configuration up to enthusiast quad. Sorry guys.
On the other hand the point is true that the i4790K overcocked capabilities enable quite better gaming-experience (i.e. in MMORPGs) because of its massively higher single-thread performance.... go and try Rift! ;p
So in fact this board with an 4790K and an enthusiast Dual-SLI like 780 Ti (or additional dedicated PhysX) in part with Samsung XP941 or SM951 will run fine and still give you opportunity to go Triple or Quad without serious performance drop to the point where you could consider the Triple or Quad on the board worth it in terms of performance.
True, it might eventually come in handy for multi-display-solutions like 3d-surround AND 4k same time, still i doubt it .. it should work flawless with at least Triple-SLI @ 3x 60hz-4K on a 8x Gen3 uplink by math without any significant performance drop... but it is seriously definetly perfect choice for any single display pc dedicated to gaming i.e. with single WQHD-G-Sync @144hz or even 4K since thats limited to 60hz at all.
So IMHO there is absolutely no reason to wait and pay lots of more money for a Haswell-E plattform on a single-display gaming-pc only. Just my 50 cents.
I actually wait for this board beeing delivered to replace my §!&$% MSI Z97 Gaming 7 which does not even allow me to add a a dedicated PhysX to my Dual-SLI. The XP941 is also to be delivered.. Try to get that hole thing done on a siginificantly higher priced MSI XPower AC .. Good Luck with the M.2 10Gbps...
Edit: Just to tell you the math...3x 4K-32Bit@60hz requires ~2,17 Gb/s send bandwith fo second and third GPU and receive bandwith of 4,34 Gb/s on the primary card. 8x Gen3 still delivers ~7,9Gb/s each card and uplink... so theres at least ~3,56Gb/s traffic left on the cards to receive uplink for CPU-inbound GPU-VRAM cashing, while 60fps on all 4k-displays still can be maintained... which is not much indeed but enough to keep the average game vsynced at 60 and cashing lots of objects on demand most of the time not bottlenecking to serious unplayable minimums on a triple 780 ti for example. sure depending on the game you cant go all ultra on triple 4k using even quad-780Ti... but thats problem of the GPU-power not beeing capable generating that much of frames, not the bandwith of the lanes delivering that frames to primary card at all.
In fact same as actual SSDs cannot even halfly utilize M.2-Ultra... actual and next-gen Quad-SLI could not even utilize 16x Gen3 PCIe multi-casted uplink... (when you focus only on gaming) which is made for display- and card-combinations not available even on white paper until 2016.
Also to allow all that next-gen-display-thingy via monitor-cables we would need the primary GPU to have at least 3x latest Display Port
Thats a lots of... not available... right now to pay so much money for 32 lanes, just quad-chan RAM and few more cores while all beeing lower clocked.