A disappointment to report, hoping it would dissuade someone else from investing in expensive hardware (good thing LLM wasn't the only thing I bought the laptop for).
So my Thinkpad now has 128GB RAM and RTX 5000 16GB dGPU. I was hoping I would be able to run Llama 3.3 70B. It loads, at a context length of 16384 and consumes 71GB system RAM and all of VRAM. Unfortunately, the calculations are not offloaded to the GPU, despite lowering the core count to 1 and using all 80 cores of the GPU. It stays at 0% utilization. The processing happens on the CPU and even when setting it to max 6 cores (HT not supported by LM Studio I guess), the CPU utilization does not go beyond 17%. It gives a response, at the most horrible speed of something like 0.05 tokens per second or even lower. Gave up on it and now downloading another 8B LLM at F16 and Q8, to take advantage of speculative decoding. If I still don't get any GPU utilization, I will need to troubleshoot (maybe driver issue?).