It may be better to use software vertex processing in many games using a mid-range or higher dual or quad core CPU, simply because these processors can now do vertex/geometry processing faster than low-end fully programmable execution units. Intel has even issued an application/support note about it:
What applications benefit more with software vertex processing?
Fully programmable execution units, being generalists, are less efficient than refined 2nd or 3rd gen fixed-function processing units, being specialists. This isn't a big deal on discrete GPUs because you can just scale up the number of execution units and other architectural tweaks to offset the performance/efficiency penalty, in addition to the secondary benefits of load balancing. But this is prohibitive on IGP due to significantly added cost, complexity, and transistor count.
e.g. ATI's R200 (R8500) had approx. 60 million transistors, roughly twice as many found in an entire contemporary Northbridge.
Adding 10 million transistors to a GPU is no big deal, but its a big deal in a Northbridge. Intel jumped the shark on its GMA X3000 design in a much-needed attempt to bring its badly-lagging IGP performance and features in-line with NVIDIA and ATI offerings.
Hopefully, GMA X3500 will be much improved and actually come close to the X3000's hype.