We certainly need some clarification here (I just don't know if I'm good enough to do that...).
Let's start with some simplifications:
1.
SMT (HT is a form of SMT) allows for code that is written in parallel (in other words, the people who wrote the code have taken a program and written it so that it can use multiple threads at the same time) to operate more efficiently. Quoting from the Intel article I linked:
Thread-level parallelism?the ability to simultaneously process multiple instruction streams or threads?can dramatically improve overall performance. Each of these threads can correspond to a different part of a program and runs on one of the multiple hardware contexts available through multi-core and multithreaded designs
It also allows for seperate program threads to start at the same time (multitasking). It is not as powerful as SMP (multi-core) because while SMT can start multiple threads, it only has the resources to process one of them completely at a time (SMP can fully process for as many cores as it's using). However, by starting multiple threads, it allows for the processor to be as busy as it can possibly be at all times.
The only drawback here is when there is a conflict with one of the threads it's started...if there's a problem it has to go back and restart the thread or start a different one. This occurs when the written code isn't optimised properly for HT (and is the reason that there are circumstances where HT actually slows things down). That said, this is not a common occurence.
Therefore, SMP functions best only when the code has been optimised for it...again, to quote from the article:
Today we rely on the software developer to express parallelism in the application, or we depend on automatic tools (compilers) to extract this parallelism. These methods are only partially successful
2.
"Reverse HT" (otherwise known as Speculative Threading) allows the compiler to "guess" the outcome when trying to create parallelism, and store those guesses (or Speculative Threads) in memory. What happens next is that the CPU compares those STs and uses what it needs then discards the rest. Therefore, rather than taking multiple parallel threads pre-designed for a single process, RHT takes a single thread and splits it into multiple specualative threads. While this would be somewhat useful in dual core, the more cores you add the better it gets. So quad and 8 core systems would massively benefit from this, even for single threaded apps (like most games) that haven't been written for it.
3. Most important to remember is the fact that it takes BOTH hardware and software to make this work. While AMD is (IMHO) the best hardware innovater today (or at least tied with Intel), their software division isn't even a pitiful shadow of Intel's by comparison.
So unless AMD has some very hidden plans (like a secret agreement with Pathscale or Sun to write a new compiler), this rumour could at best mean that AMD will be ready with the hardware once Intel has written the software (and not before).
I somewhat disagree with dmens about it not being possible for K8L, but I respect his experience (this is his profession) and will remain on the fence about the possibility. I still think it's possible with what we have seen of K8L, but I'm not betting any money on it.