Pretty much with no cpu you are going to run it maximum potential for even if a CPU has X calculation units at Y ghz you have to feed the cpu useful information so it can calculate useful data.
Thus the cpu has a series of input funnels, things like L2 Cache, L3 Cache, sometimes L4 cache, then ram, then hard drive and the goal is to get these various caches and rams to have the lowest amount of latency possible. Latency is the amount of time to retrieve 1 instruction / 1 piece of data. Bandwidth is the amount of instructions or data it can retrieve all at once. Memory size is the amount of instructions or data you can keep in that level of cache, ram, or hard drive.
Pretty much the faster you can get the data to the cpu the faster real world performance your cpu can do for it can take advantage of all that horsepower your cpu has. Thus it is a big deal on the latency of your l2 cache and so on.
There is another big deal on something called prefetch where the cpu is making intuitive leaps where if I am computing this math problem I probably should retrieve the next predicted amount of data and move it to the closest cache possible to make it faster to retrieve. Think of it like I am mixing the dry ingredients for a baking, then I need to gather up the liquid ingredients next, followed by the mixing equipment. I do not need to get out the baking pan yet, but lets prewarm the oven for while I am not at the pan state it takes X minutes to warm up the oven and so on. It is planning out the sequence of events to reduce latency as much as possible.
-----
Larger L2 and L3 caches help up to a point but after a certain amount you get diminishing returns.
Higher amount of bandwidth have very little improvements in cpu tasks, but in gpu tasks higher amount of bandwidth makes huges differences in gpu workloads.
------
Many of the cpu improvements of the last few years have been realized by figuring out ways to fetch instructions faster with smarter prefetch engines, lower latency cache, etc.