I think the best way to approach RISC vs. CISC is to look at the historical trends. Back in the 60s and 70s, the high-performance processors were multi-chip with slow main memory (eg core memory). Going to memory was an extremely expensive operation, so instruction bandwidth was at a premium. Like Mark R said, this forced designers to increase performance by making instructions "do" more, thereby requiring less instruction bandwidth. This was facilitated by microprogramming, an implementation technique in which, at a high level, an instruction's execution was dictated by an on-chip ROM. A micro-programmed datapath is relatively easy to design, so designers would replace commonly occuring machine instructions (programming was largely done in assembly at the time) with a single instruction.
A few things occured in the 70s and 80s which changed the playing field. First, compiler design progressed substantially. What people began to discover is that the complex instructions and addressing modes (the method in which an address is produced) were largely useless to compilers. If the semantics of the instruction, its address mode, or the way it used registers didn't exactly match the high-level use in the language, the compiler couldn't use the instruction. See Mark R's explanation of the 80/20 rule for the results of this.
Secondly, a few innovations on the chip front occured. The invention of DRAM in 1970 allowed fast, cheap, dense main memories...instruction bandwidth isn't as big of an issue, so perhaps now complex instructions aren't necessary. The advent of VLSI (very-large scale integration) allowed the design of high-performance single-chip microprocessors, and by the 80s it was possible to construct a high-frequency, pipelined microprocessor with an integrated cache. There were a few problems with CISC instruction sets that made this difficult. First, the complex instructions were difficult to pipeline, and second, microprogrammed designs are slow (poor instruction fetch bandwidth and difficult to pipeline).
Things start coming together now. In order to ease compiler design, RISC instructions are atomic (they do one thing), orthogonal and regular (any instruction can use any combination of registers, for the most part). Complicated addressing modes are no longer needed. These factors also allow the construction of a simple 5-stage pipeline, with a one-cycle execute stage.
Instead of microprogramming the datapath, innovations in VLSI design allow the instruction set to be hardwired. This, along with an instruction cache, allows the processor to sustain the execution of one instruction per cycle. But because transistor resources were still limited in the 80s, the number of instructions had to be reduced (which didn't have an impact due to the 80/20 rule). In addition, to make hardwired decoding easier, instructions have a single fixed length. x86 instructions can, IIRC, be from 1 byte to 15 bytes long, making it difficult to determine where one instruction ends and another begins. RISC instructions are typically all 32-bits long.
So the features of RISC over CISC are:
- Simple, atomic instructions
- Fixed-length, easy-to-decode instructions (the division of the fields of an instruction format into opcode, register, and function spaces is regular)
- Easy-to-compose and orthagonal instructions
- Fewer number of addressing modes
- Fewer number of instruction formats
- Larger number of registers
Personally, I don't think that a fewer number of instructions is a required feature of RISC. The fact that the current PowerPC instruction set has far more instructions than the Berkeley RISC or Stanford MIPS of the 80s doesn't make it any less RISCy; the sparse number of instructions in the first RISCs was just an artifact of the limit transistors resources available at the time. RISC is really an agreement between the architects and compiler writers than made instructions simpler and easier to use that also facilitated pipelined single-chip implementations.
It's no surprise that some people joke that RISC actually stands for "Really Invented by Seymour Cray." Cray developed many of these ideas in order to make extensive use of pipelining and high clock speeds when he designed the CDC 6600 in 1964 and the Cray-1 in 1976. John Cocke credited these designs when he designed the IBM 801 in 1977, regarded as the first computer to bring all the RISC ideas together.
Of course, any discussion of RISC vs. CISC would be remiss without talk of the present. Advances in process technology, circuit design, microarchitecture and even manufacturing have relegated architecture (instruction set) design to a second-order effect on performance, except for (arguably) floating-point and vectorizable code. Hence, through the power of backwards compatibility, today x86 is every bit as competitive as any RISC.