Well the idea of caching is to be able to reuse data. So you have certain values in the cache and you use them multiple times as you can access them very quickly. Now with that in mind, you just have your texture cache onboard that is all you really need. How big of a cache though? Well that depends. Just to get a rough idea, if our maximum texture resultion is 2048, we want to be able to cache at least 2048 texels. So doing a little math we get 2068*4 bytes = 8 KB. Now, add some overhead, etc and you've got a few more KB. So somewhere between 8-16 KB would be good. I'd throw out a number of maybe 12 KB as probably being good.
Embedded memory certainly would be good. Stick all your textures on the chip and you have basically one huge cache (though not as fast as an actual cache). However, this really isn't reasonable for the simple reason of cost.
I hope that answers your question, if not let me know.