Discussion General CPU µArch Research Thread

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tuna-Fish

Golden Member
Mar 4, 2011
1,616
2,375
136
To me, the current RAM capacity limits seem to be due to the form factors and density of the chips.

They are not. The capacity limits are caused by cost; platforms where cost is not an issue have no problem you can buy 256GB dimms today and probably 1TB dimms later this year. They just cost more per GB than more common configurations, and no-one even attempts to sell them to consumers because there would be no sales.
 
Reactions: Thunder 57

soresu

Diamond Member
Dec 19, 2014
3,688
3,025
136
To me, the current RAM capacity limits seem to be due to the form factors and density of the chips.
Form factors especially.

A single server RAM module can give you as much as 512 GB.

Even consumer grade DIMMs can go up to 48 GB, and yet such capacities are very rare on consumer gfx cards.
 

soresu

Diamond Member
Dec 19, 2014
3,688
3,025
136
No-one has figured out how to build multiple layers of DRAM without requiring exposures per layer, without that it's not worth the cost.
Does that include the various capacitorless DRAM designs I've seen in research papers for the last several years?
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,616
2,375
136
There are many interesting designs that work in small scale and using research tools. None so far that are manufacturable with usable yield at usably small feature sizes.

Maybe eventually one of them will work, and suddenly we get DRAM scaling back.
 
Reactions: Tlh97 and soresu

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
Since I haven't seen an explanation of the upcoming process nodes in one place, I ended up writing a paper about it for my Masters a few years ago. This topic seems to be a place that might have some interest in it. In case anyone wants to read my terrible attempt at explaining the current plans for future process nodes (as of 2022 at least) you can find it here: https://docs.google.com/document/d/...ouid=115899381745516829681&rtpof=true&sd=true
EDIT: I realize now that I didn't give permission for others to view the document. This should fix that: https://docs.google.com/document/d/...ouid=115899381745516829681&rtpof=true&sd=true
 
Last edited:
Jul 27, 2020
23,462
16,510
146
Since I haven't seen an explanation of the upcoming process nodes in one place, I ended up writing a paper about it for my Masters a few years ago.
It's not bad. Have you written many other papers like these? If I had to nitpick, I guess I would've liked it being three times more voluminous (as someone who had to write for a class, I know the horrified look on your face after reading that!).
 

Martimus

Diamond Member
Apr 24, 2007
4,490
157
106
It's not bad. Have you written many other papers like these? If I had to nitpick, I guess I would've liked it being three times more voluminous (as someone who had to write for a class, I know the horrified look on your face after reading that!).

Thanks! I easily could have written more, but there was a lot to get through so I tried to just get the basics of each technology. I ended up with 38 references by the time I was done.

It was really interesting learning about all the new technologies that are available to continue increasing performance, and the research I did really changed my mind about the ability to continue Moores law, as I thought it might be getting closer to hitting a wall before I started writing this.

I didn't write any other papers on the subject though, so I don't have any other papers for you. The book from the first reference is a good one if you want to learn about the beginning of the microprocessor though: Understanding Moore's law: Four decades of Innovation.
 
Reactions: igor_kavinski

soresu

Diamond Member
Dec 19, 2014
3,688
3,025
136
Another research paper that seems to target even further perf/watt optimisation of the Forward Slice Core µArch:

Sustainable High-Performance Instruction Selection for Superscalar Processors

ABSTRACT
Sustainability is a grand societal challenge, which requires our urgent attention given the significant and growing contribution of electronic devices to global warming. The environmental footprint of an electronic device comprises of two major contributors: (1) the embodied footprint due to raw material extraction, manufacturing, assembly, end-of-life-processing, and (2) the operational footprint due to device use during its lifetime. Sustainable hardware design hence requires a holistic approach that encompasses the entire lifetime of an electronic device.

In this paper, we demonstrate how to leverage conventional performance-power-area (PPA) analysis towards sustainable hardware design by investigating the sustainability-performance tradeoff of a non-trivial hardware circuitry, namely the dynamic instruction selection logic in superscalar processors. We assess five previously proposed complexity-effective and power-efficient instruction selection approaches compared to conventional out-of-order (OoO) selection, namely Casino, Load Slice Core (LSC), Forward Slice Core (FSC), Delay-and-Bypass (DnB) and Freeway. We find that Casino, FSC and OoO are Pareto-optimal, optimally balancing the environmental footprint against performance; in contrast, LSC, DnB and Freeway are suboptimal. In addition, based on these insights, we further improve FSC’s environmental footprint and propose FSC++ as a compelling sustainable design point: hardware synthesis to a 7 nm technology node and cycle-accurate FPGA simulation of complete SPEC CPU2017 benchmarks show that FSC++ reduces the environmental footprint by around 40% while degrading performance by only 1.7% compared to an OoO baseline.

Paper link.
 
Last edited:
Reactions: igor_kavinski

soresu

Diamond Member
Dec 19, 2014
3,688
3,025
136
This one is more related to human R&D time reduction in CPU design:

Automated CPU Design by Learning from Input-Output Examples

Abstract
Designing a central processing unit (CPU) requires intensive manual work of talented experts to implement the circuit logic from design specifications. Although considerable progress has been made in electronic design automation (EDA) to relieve human efforts, all existing tools require hand-crafted formal program codes (e.g., Verilog, Chisel, or C) as the input. To automate the CPU design without human programming, we are motivated to learn the CPU design from only input-output (IO) examples, which are generated from test cases of design specification. The key challenge is that the learned CPU design should have almost zero tolerance for inaccuracy, which makes well-known approximate algorithms such as neural networks ineffective.

We propose a new AI approach to generate the CPU design in the form of a large-scale Boolean function, from only external IO examples instead of formal program code. This approach employs a novel graph structure called Binary Speculative Diagram (BSD) to approximate the CPU-scale Boolean function accurately. We propose an efficient BSD expansion method based on Boolean Distance, a new metric to quantitatively measure the structural similarity between Boolean functions, gradually increasing the design accuracy up to 100%. Our approach generates an industrial-scale RISC-V CPU design within 5 hours, reducing the design cycle by about 1000× without human involvement. The taped-out chip, Enlightenment-1, the world’s first CPU designed by AI, successfully runs the Linux operating system and performs comparably against the human-design Intel 80486SX CPU.

Our approach even autonomously discovers human knowledge of the von Neumann architecture.


Paper link.
 

soresu

Diamond Member
Dec 19, 2014
3,688
3,025
136
No one is forcing them to keep them as sticks. They could theoretically make it a cube of RAM made up of multiple layers of PCB and DRAM chips. Put small spaces in between and use a fan to keep everything running cool. The larger the RAM size, the larger the cube. What would be the drawbacks to that approach?
Air based cooling will never be viable for stacked logic/memory of any significantly improved density.

Look up the boundary layer effect and how much it limits thermal dissipation from heatsinks for a good idea of how much worse it would be to cram dozens of silicon layers with only a small volume between them for cooling.

Far more advanced (and likely expensive to implement) methods do exist.

Such as DARPA ICEcool¹, thermal transistors and thermal diodes among others.

¹hollow thermal vias for micro fluidic liquid cooling through the whole vertical height of the chip.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |