future cpus: x86 to EPIC possibilities?

Titan

Golden Member
Oct 15, 1999
1,819
0
0
Ok, I'm not super hardware guy, I'm a software engineer by trade. But I did take computer architecture in school and liked it, I get piplining and scoreboarding and basic ISA stuff. So please be gentle, I'm wondering if some chip people that frequnet here care to stretch their imaginations with me.

It seems that x86 is here to stay, at least for another decade is mass numbers, probably much longer than that. We all know that the newer pentiums actually translate x86 instructions into RISC-like instructions, and the core of the processing is done in a much more RISC-like manner.

With the new growing trend toward massive on-die integration (dual cores, memory controllers, etc) I'm wondering if some itanium style-architectures could be used on a chip that translates x86 to EPIC type instructions. With all the space used for mupltiple cores, memory controllers, crypto units, and other goodies that future integration will bring, it doesn't seem to hard to throw in a buttload of extra registers for EPIC-style instructions. Bear in mind that my knowledge of EPIC is very limited, basically "use a buttload of registers and accomplish more tasks in parallel," is all I know.

I'm thinking potentially we could have a bunch of shared registers between multiple cores, this may improve paralellism or do some EPIC-like things. It may lead to a bigger hyperthreading technology, like ultra-threading or something.

I'm just thinking that a lot of the ideas used in EPIC seem potentially compatible with future chip designs, now that we have abandoned the Mhz push and are going to be looking for more powerful chips through integration and features. Anyone have any ideas of the possibilites that are clearer than my lack of expertise? I'm just a dreamer here, I don't wanna hear "that's impossible" replies. This is just high-level dreaming.
 

aka1nas

Diamond Member
Aug 30, 2001
4,335
1
0
Doesn't the itanium already have hardware x86 emulation, which performs very poorly?
 

Titan

Golden Member
Oct 15, 1999
1,819
0
0
Originally posted by: aka1nas
Doesn't the itanium already have hardware x86 emulation, which performs very poorly?

Yes, but that's not what I'm addressing. I suspect that the dismal x86 performance of the itanic was the result from intel wanting to migrate people to a new architecture, it wasn't designed to be a high-powered x86 cpu. I'm wondering, if down the road, we will see x86 cpus that effectively leverage things like multicore dies with technology and approaches from the 7 years of researching itanium. It seems that there is there potential there (from a high-level). Anyone have any detailed thoughts on this?
 

SirPsycho

Senior member
Jul 12, 2001
245
0
0
Originally posted by: tkotitan2
Ok, I'm not super hardware guy, I'm a software engineer by trade. But I did take computer architecture in school and liked it, I get piplining and scoreboarding and basic ISA stuff. So please be gentle, I'm wondering if some chip people that frequnet here care to stretch their imaginations with me.

I'm a software engineer by trade also, but I haven't gotten as far as computer architecture yet. However, I'm very interested in it, and I think for the most part, I do understand a lot of the core concepts. I may not use the correct terms for them, though.

It seems that x86 is here to stay, at least for another decade is mass numbers, probably much longer than that. We all know that the newer pentiums actually translate x86 instructions into RISC-like instructions, and the core of the processing is done in a much more RISC-like manner.

I think that some variation of x86 will be around for a very long time. If it ever is displaced by another architecture, you'll probably have Microsoft and Sun, with their virtual machine and byte code systems to thank for it.

With the new growing trend toward massive on-die integration (dual cores, memory controllers, etc) I'm wondering if some itanium style-architectures could be used on a chip that translates x86 to EPIC type instructions. With all the space used for mupltiple cores, memory controllers, crypto units, and other goodies that future integration will bring, it doesn't seem to hard to throw in a buttload of extra registers for EPIC-style instructions. Bear in mind that my knowledge of EPIC is very limited, basically "use a buttload of registers and accomplish more tasks in parallel," is all I know.

A good explanation of EPIC that I've read is at ArsTechnica: A preview of Intel's IA-64. They have a ton of articles on CPU architecture and related subjects that I have found to be very interesting reading.

I'm thinking potentially we could have a bunch of shared registers between multiple cores, this may improve paralellism or do some EPIC-like things. It may lead to a bigger hyperthreading technology, like ultra-threading or something.

Wouldn't shared registers and multiple cores be mutually exclusive? My understanding of the idea of a register is to have very fast local storage that doesn't require bus trips and high latency. Maybe something like HyperTransport could help reduce the latency involved, but I would think that a better idea would be to just have a lot of extra on-chip registers that could be renamed, and perhaps come up with some sort of synchronization method. I don't know how that would work without slowing things down quite a bit, though. I'd have to ponder that for a while and see what comes to mind.

I'm just thinking that a lot of the ideas used in EPIC seem potentially compatible with future chip designs, now that we have abandoned the Mhz push and are going to be looking for more powerful chips through integration and features. Anyone have any ideas of the possibilites that are clearer than my lack of expertise? I'm just a dreamer here, I don't wanna hear "that's impossible" replies. This is just high-level dreaming.

A lot of the ideas that are used in EPIC are used in current CPU architectures as well. SIMD instructions, like SSEn in particular, are a good example of this. Throw in lots of little execution units, give the instruction set a way to tell the CPU to use them simultaneously, and run with it. Kinda like multiple cores, but not quite as complete. Part of the problem, though, is that these instructions rely on either the compiler detecting possibilities for parallelism and emitting the correct opcodes automatically, or a lot of hand-tuning and instruction reordering by the programmer. The latter has much more potential for efficiency, but really isn't reasonable for most things. The former has the benefit of being applied more comprehensively, but to do a comparable job would probably slow down compilation so much as to not be reasonable, and you have to be sure not to accidentally change the meaning of the code by reordering instructions. It's a tough job.

One benefit to the bytecode approach is that run-time analysis can be done, and optimizations can be done on the fly that a static compiler might not be able to detect, since it can't tell how often a function is going to be called due to user interaction or user data. I think to really take advantage of CPU-level parallelism, programming languages and/or compilers are going to need better ways to let the programmer indicate when things can be done in parallel, and just as importantly, when things cannot be done in parallel. Again, I'm not sure exactly how this would work, but I think it will become more important as multiple cores proliferate.

I'm currently toying around with the idea of having a coprocessor card with a handful of FPGAs on it that you could reprogram to fit whatever task you wanted to accelerate. Algorithms with a high level of parallelism, like encryption and compression, could really be sped up quite a bit by this. I saw an article on Slashdot a while back about some people who designed a coprocessor card that offloaded gzip compression from the CPU for use by Apache, and that's exactly the sort of thing I had in mind.

I've also found a company, Charmed Labs, that makes an FPGA programming kit that plugs into a GameBoy Advance's cartridge slot and allows you to control the GBA hardware and program the FPGA chip. I don't completely understand it, but it's quite inexpensive for what it is, considering that it's less than $300 for the top-end model, including what I paid for a GBA SP at Fry's. Seems like a good learning tool to me.

Charmed Labs XPort 2.0

Wow. That was a lot more than I had intended to write. That should count as more than one post. Heh.

I'll be interested in hearing what you think of my crazy ideas.
 

SirPsycho

Senior member
Jul 12, 2001
245
0
0
Originally posted by: tkotitan2
Originally posted by: aka1nas
Doesn't the itanium already have hardware x86 emulation, which performs very poorly?

Yes, but that's not what I'm addressing. I suspect that the dismal x86 performance of the itanic was the result from intel wanting to migrate people to a new architecture, it wasn't designed to be a high-powered x86 cpu. I'm wondering, if down the road, we will see x86 cpus that effectively leverage things like multicore dies with technology and approaches from the 7 years of researching itanium. It seems that there is there potential there (from a high-level). Anyone have any detailed thoughts on this?

Well, AMD is already talking about having multicore dies by the end of next year. Intel has been talking up multicores for a while with Itanium, but recently with their Pentium line as well. They've even apparently ditched their Tejas chip design to add multicore capabilities to their current design (Prescott/Noconia?) instead. IBM and Sun also have multicore CPUs either in the works or currently available.

I don't know if this is a trickle-down of knowledge gained by developing Itanium in Intel's case, but I'm sure it didn't hurt. Obviously, it isn't going to help AMD any, but they don't seem to need much help lately. Their egos have to be pretty inflated right now, having forced Intel's hand to release x86-64-compatible processors. Good for them. Competition is a wonderful thing.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: tkotitan2
With the new growing trend toward massive on-die integration (dual cores, memory controllers, etc) I'm wondering if some itanium style-architectures could be used on a chip that translates x86 to EPIC type instructions. With all the space used for mupltiple cores, memory controllers, crypto units, and other goodies that future integration will bring, it doesn't seem to hard to throw in a buttload of extra registers for EPIC-style instructions. Bear in mind that my knowledge of EPIC is very limited, basically "use a buttload of registers and accomplish more tasks in parallel," is all I know.
That's basically the idea behind a trace cache - the first time you see a block of code, you run it through an x86 decoder to produce a VLIW-like "trace" which can be fed (almost) directly to the execution units in future runs. This saves you the ~5 cycle x86 decoding penalty the next time you see this block of code, and lets you calculcate dependencies up front.

I'm thinking potentially we could have a bunch of shared registers between multiple cores, this may improve paralellism or do some EPIC-like things. It may lead to a bigger hyperthreading technology, like ultra-threading or something.
You can't really do that, because it takes multiple cycles nowadays to get a signal from one side of a chip to the other, and the latency to the L1 cache is something like 3 cycles, so the time it takes to read a register on a neighboring core on the same die would not be any fast. There are other architectural things that would make it very difficult too.

Anyone have any ideas of the possibilites that are clearer than my lack of expertise? I'm just a dreamer here, I don't wanna hear "that's impossible" replies. This is just high-level dreaming.
This is one of the better Highly Technical threads in a while .
 

imgod2u

Senior member
Sep 16, 2000
993
0
0
I would say emulation. There simply no need to directly execute x86 in hardware. Intel's target for IA-32 EL is around 70% native performance. Considering just how fast Itaniums are right now (in FP mostly, but still) and will grow to be (multi-core, multithreaded Itaniums), even 70% of that should be great.

It won't beat the latest and greatest x86 processor most likely (or maybe it will, depending on how much the performance disparity between IA-64 and x86 processor are), but it'll be good enough for legacy applications and performance-critical applications coud just be ported to IA-64. The number of performance-critical applications vs legacy, non-performance critical applications (Office, IE, etc.) is very small and hence, porting them should be easier.

Putting x86 stuff in hardware would simply draw more power, produce more heat and offer more complications in fabrication and design. Plus if there's an update to the x86 side (a new extension or so), with emulation, you could simply update your software to support these new extensions instead of having to replace the processor.

Is it possible to have 2 processors, an x86 and IA-64 processor on the same processor and have them both perform really well? Maybe. Is it practical and/or neccessarily? Most likely not.
 

Matthias99

Diamond Member
Oct 7, 2003
8,808
0
0
Originally posted by: SirPsycho
I'm currently toying around with the idea of having a coprocessor card with a handful of FPGAs on it that you could reprogram to fit whatever task you wanted to accelerate. Algorithms with a high level of parallelism, like encryption and compression, could really be sped up quite a bit by this. I saw an article on Slashdot a while back about some people who designed a coprocessor card that offloaded gzip compression from the CPU for use by Apache, and that's exactly the sort of thing I had in mind.

My circuit design professor said that things like this had been tried in the mainframe/supercomputing world a while back, when FPGAs were the hot new thing. It seems it wasn't worth the tradeoffs -- if you have a lot of different tasks you need to do, reprogramming the FPGAs all the time gets to be a bottleneck, and you can only flash them so many times. If you only have a handful of tasks to do (or just one), it's generally more efficient to either get more CPU power, or, if the task is common enough, to build a 'hardwired' coprocessor chip (like they're starting to do for encryption/decryption).

It is an appealing idea, though. I don't know how current FPGA speeds stack up to CPUs, but I'm guessing that it's still not really competitive enough for complex tasks, unless they're highly parallelizable.
 

SirPsycho

Senior member
Jul 12, 2001
245
0
0
Originally posted by: Matthias99
Originally posted by: SirPsycho
I'm currently toying around with the idea of having a coprocessor card with a handful of FPGAs on it that you could reprogram to fit whatever task you wanted to accelerate. Algorithms with a high level of parallelism, like encryption and compression, could really be sped up quite a bit by this. I saw an article on Slashdot a while back about some people who designed a coprocessor card that offloaded gzip compression from the CPU for use by Apache, and that's exactly the sort of thing I had in mind.

My circuit design professor said that things like this had been tried in the mainframe/supercomputing world a while back, when FPGAs were the hot new thing. It seems it wasn't worth the tradeoffs -- if you have a lot of different tasks you need to do, reprogramming the FPGAs all the time gets to be a bottleneck, and you can only flash them so many times. If you only have a handful of tasks to do (or just one), it's generally more efficient to either get more CPU power, or, if the task is common enough, to build a 'hardwired' coprocessor chip (like they're starting to do for encryption/decryption).

It is an appealing idea, though. I don't know how current FPGA speeds stack up to CPUs, but I'm guessing that it's still not really competitive enough for complex tasks, unless they're highly parallelizable.

Well, I'm not talking about constantly reprogramming them. I'm just saying that if you were going to be doing a lot of work in Photoshop, for instance, you could reprogram the card with an instruction set that would accelerate certain filters that you use a lot. The same idea could be applied to a lot of other domains as well... database searching, encryption/decryption like you mentioned, compiling code... any number of possibilities. Modern FPGAs are really fast, and it's not uncommon to be able to achieve a 1000% speedup if the task is highly parallelizable (is that a word? heh). According to Xilinx's website, their current FPGAs have unlimited reprogrammability.

Star Bridge Systems uses FPGAs as the basis for their "hypercomputers", which are basically just a whole bunch of FPGAs with a custom programming language tool to reprogram the FPGAs at runtime. I seem to remember reading something that said they could be reprogrammed around 10,000 times per second, though I can't imagine you'd get any work done in between that way. It's really interesting stuff; I wouldn't mind having one of my own to play around with.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: SirPsycho
Star Bridge Systems uses FPGAs as the basis for their "hypercomputers", which are basically just a whole bunch of FPGAs with a custom programming language tool to reprogram the FPGAs at runtime. I seem to remember reading something that said they could be reprogrammed around 10,000 times per second, though I can't imagine you'd get any work done in between that way. It's really interesting stuff; I wouldn't mind having one of my own to play around with.

http://www.stretchinc.com/products_s5000.php
 

kpb

Senior member
Oct 18, 2001
252
0
0
The biggest problem I'd see with running x86 on epic well is that epic moves alot of things off chip and into the complier. IE it doesn't really do branch prediction or out of order execution. The compiler has to do alot of work to make sure the code is feed efficiently to the chip. That just seems so different from the x86 architecture that I can't see them ever getting it faster than x86 on current micro ops architecture given the same amount of resources. Yes current epic x86 emulation is descent and getting better but remember that this are huge very expensive chips and you can do better for alot cheaper with a P4. The architectures are just too different. On top of that Intel hasn't shown any plans of making epic main stream. I could see a new x86 risc mode where you could program directly to the micro ops and therefore skip the decoding stage and access all the registers directly etc alot sooner than I could see x86 on epic for a main stream processor.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: kpb
The biggest problem I'd see with running x86 on epic well is that epic moves alot of things off chip and into the complier. IE it doesn't really do branch prediction or out of order execution. The compiler has to do alot of work to make sure the code is feed efficiently to the chip. That just seems so different from the x86 architecture that I can't see them ever getting it faster than x86 on current micro ops architecture given the same amount of resources.
I think trace caches are really similar to x86->VLIW translation.
 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0
Originally posted by: kpb
The biggest problem I'd see with running x86 on epic well is that epic moves alot of things off chip and into the complier. IE it doesn't really do branch prediction or out of order execution.

Sure we do branch prediction. Itanium 2 has a rather sophisticated two-level local history branch predictor, with a large dedicated branch history backing store, return stack buffer, and perfect loop predictor.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |