I think Hyperthreading is only available in P4 "C" and later. It is not in the Pentium D, but it is in the Pentium Extreme Edition dual-core. Regardless I doubt HT would make a dent in the decoding efforts as it mainly aims to free up other thread time by tricking Windows into thinking it has the capability to execute another thread at once. It doesn't increase the speed, it just dynamically allocates some free time to background processes. Something like that.
The video card won't do a thing if it doesn't have the acceleration. As long as it can push that much 2D it'll be fine. Apparently the X8xx series doesn't have the H.264 acceleration.
You may want to check your I/O bandwidth. If I put the H.264 video on a ramdisk, it plays so much faster than if I play it from a hard drive, even the video file being contiguous ("defragged"). I have a Winchester 90nm. 3500+@2.2 GHz stock, and the Da Vinci 1080p Teaser (the more intensive one) plays just fine from RAM but stutters from hard disk. You can discover your bottleneck by using the Windows performance timers in administrative tools. An Athlon 64@2.0 GHz should make mince meat out of a P4A/B 2.5 GHz, and if the A64 sports SIMD (SSE,SSE2,maybe SSE3) extensions, it will be an even worse beating. Even my P4C 2.6 GHz HT was beat by A64@2.2 GHz badly (by like 20 FPS) in games. OK, there may be some rare scenario where the P4 comes ahead but I'd say it's unlikely. I've used both a P4C 2.6 GHz HT and a A64 3500+, and the 3500+ is faster at *everything* by a mile, except sometimes switching programs when under heavy load, obviously due to lack of HT. My A64/2.2 GHz stock is a lot faster at the Step In Liquid video vs. P4C/HT@overclocked 3.2 GHz.
Edit: Here's at least something you can try. This will make the file contiguous if it helps any. Also if you come across a buffer setting in the decoder or demuxer, set it as high as you can. That should help as well. I haven't come across such a thing yet though.
Just literally drag and drop your H.264 file into this command line program's EXE file in Explorer or Windows desktop. When the contig program closes the file will have been defragmented. Then play it and see what happens. Maybe you have postprocessing on too, that'll slow it down. Good luck. I still think I/O is the main issue here. It was for me and this particular Da Vinci 1080p teaser.
http://www.sysinternals.com/Utilities/Contig.html
Edit2: I found an input buffer option. Let me report back on how good the H.264 file plays with Haali 8192 buffer vs. 65536 buffer.