A split NAND/DRAM bus

sm625 · Jul 17, 2012

A typical x86 cpu has a 128 bit memory controller. A typical SSD has 8,10,12, or 16 channels spanning anywhere from 64 to 256 bits (or more?) For simplicity's sake, lets take a SSD design that is also 128 bits.

So doesnt it makes sense to combine these two very complicated and costly busses? Take a 8Gbit DRAM die and stack a 64GBit flash die and a multiplexer and package them all together into one Hybrid Flash/DRAM chip. Take 8 or 16 of those and you've got a bus.

Then go into the cpu and smarten up the memory controller. Have it do DRAM and flash operations over the same shared bus. Call the new protocol FDDR. DRAM memory bandwidth would take a hit, but imo there is currently plenty of extra. SSD access times would shoot through the roof. A tenfold increase in access times is easily attainable. 50 fold is possible.

There is no reason this cannot be done. Combining the two types of memory into one physical package is an absolute necessity anyway, to prepare us for the next generation of nonvolatile memory. So it only makes sense to combine the two busses now.

The added cost to a cpu and motherboard would be negligable, in terms of transistors and routing.

The added cost to a DRAM chip is a bit tougher to estimate. But at worst it would only be the cost of a typical DDR3 chip plus the cost of a MLC NAND chip plus a few dollars on top of that. So you would be paying roughly $120 for two 4GB/64GB (DDR3/flash) memory sticks.

William Gaatjes · Jul 17, 2012

A combined Dram flash buffer is not that handy.

When comparing the write latency of flash with DRAM. It is enormous and would bring the pc to a grinding halt because of the shared bus.

I posted this once long ago in this section of the AT forum.
It is not original, it is just an extension of how it was done in the old days.
The cpu itself would load the data. Later on dedicated controllers would take care of data retrieval from mass storage.
Now, while making the shift more and more to flash and in the near future memristor memory, it makes sense that a dedicated storage controller is integrated upon the CPU. Of course some specialized chip may still be needed or even integrated in the memory chip, but in the end it all comes down to integrating as much on the cpu die as possible. It is a lot easier to make 2048 bit wide bus on a die then it is between two physically separated chips .

Today the package on package technique (POP) is becoming more and more popular.

Here a DRAM is directly connected on top of the cpu or SOC die. Some ARM chips from texas instruments have this. Look for example at the SOC (System on a chip) chip for the Beagleboard. Those are a lot of connections between two physical die. But this technique is limited to the maximum dissipation of the SOC. Thermal stress issues i assume.

But for a cpu that becomes pretty hot (100W TDP), POP is not really an option for as far as i know. To many thermal hotspots that could cause failure.

I personally am waiting for the sata bus to disappear and that flash drives get a full PCI GEN 2 connection. To be connected directly to the memory controller / cross bar switch of the CPU.

We already have the PCI-e controller on cpu, the memory controller, the GPU slowly becomes integrated. A dedicated storage controller acting partially as a MMU (Memory managing unit) but optimized for mass storage is all that is missing. Then 64bit addressing ranges become instantly a standard feature to use. Having a full linear address range extending upon the mass storage such as SSD or HDD. Now this is also sort of the case, but the operating system takes care of it in software with a file system and then we have a storage controller all with different busses and protocols. By unifying the mass storage in such a way(By dedicated hardware and software) that is really addressable in the same way as RAM is, we will experience quite the speed up.

People who work with embedded micro controllers such as for example the ARM7TDMI are already familiar with the concept of having the flash and the ram in one linear memory range. No bank switching or other side stepping to gain access to memory. Just modifying pointers to address ranges.

Of course, the mass storage will always have some higher latency. And that is why a dedicated on cpu storage controller that is coupled to the main MMU is much better. The cpu cores can while waiting for the storage controller retrieving the data, of course switch between other threads that do not have to wait. Again, this ask for a tight integration of the OS software with the onboard cpu hardware.

The PC still has all this I/O with all this layers while it is no longer really needed. Compatibility is the only reason.

imagoon · Jul 17, 2012

I am not all that sure there would be any benefit for this. Other than some minor latency, the PCI-E bus is being used what it was designed for. I expect that the SATA might be forced to grow more or we will end up with some sort of "PCI-E to the SRAM" interface at some point.

Don't underestimate the value of compatibility. Many more companies have gone bankrupt or fallen out of use by trying to only do it "their way". Itanium and HP comes to mind.

William Gaatjes · Jul 17, 2012

imagoon said:
I am not all that sure there would be any benefit for this. Other than some minor latency, the PCI-E bus is being used what it was designed for. I expect that the SATA might be forced to grow more or we will end up with some sort of "PCI-E to the SRAM" interface at some point.

Don't underestimate the value of compatibility. Many more companies have gone bankrupt or fallen out of use by trying to only do it "their way". Itanium and HP comes to mind.

The problem is acceptance and commercial politics.
A hypothetical situation :
Suppose that Intel or AMD would work together with Microsoft.
AMD and Intel would start making flash chips (again).
AMD and Intel work together with Microsoft and come up with an integrated storage controller the "SCU" onto the cpu and tightly coupled.
The new OS windows 9 would have full support for it and even still have backward compatibility for old filesystem api's next to a newer api to take full use of the gained speed.
This would piss of a lot of manufacturers of flash and flash controllers and any part designed for .
That would mean a lot of lawsuits. Patent fencing...
A lot of companies not willing to support it and undermining whenever the managers can.

What we have seen with the android OS that compatibility becomes an issue when there exists a whole economical eco system developed around a certain technology.

Of course, Itanium was such a different beast, that is a whole other issue.
What i have written in the previous post, maintains compatibility on the cpu instructions level. The third party software is compatible , it uses the OS api to access hardware. Only the OS and the hardware have had a radical overhaul where it is needed.

With USB 3.0, sata will not really be necessary. And when sata becomes even faster, the mass storage device such as an SDD might just as well be directly connected to the cpu through pci-e. Might need shorter cables (less then 4 inch) or even just a pci-e card... But hey, is that really an issue ?

imagoon · Jul 17, 2012

William Gaatjes said:
The problem is acceptance and commercial politics.
A hypothetical situation :
Suppose that Intel or AMD would work together with Microsoft.
AMD and Intel would start making flash chips (again).
AMD and Intel work together with Microsoft and come up with an integrated storage controller the "SCU" onto the cpu and tightly coupled.
The new OS windows 9 would have full support for it and even still have backward compatibility for old filesystem api's next to a newer api to take full use of the gained speed.
This would piss of a lot of manufacturers of flash and flash controllers and any part designed for .
That would mean a lot of lawsuits. Patent fencing...
A lot of companies not willing to support it and undermining whenever the managers can.

What we have seen with the android OS that compatibility becomes an issue when there exists a whole economical eco system developed around a certain technology.

Of course, Itanium was such a different beast, that is a whole other issue.
What i have written in the previous post, maintains compatibility on the cpu instructions level. The third party software is compatible , it uses the OS api to access hardware. Only the OS and the hardware have had a radical overhaul where it is needed.

With USB 3.0, sata will not really be necessary. And when sata becomes even faster, the mass storage device such as an SDD might just as well be directly connected to the cpu through pci-e. Might need shorter cables (less then 4 inch) or even just a pci-e card... But hey, is that really an issue ?

The real issue is cost:benefit. In the android market there is value having a combined system like this as it reduces the over all cost of the device and there was (at the time) a value in a unified approach [less parts]. Also SATA still out performs USB3 in all real world tests so I wouldn't consider it a threat. They are designed for 2 different uses. USB is a device interconnect that needs to bus share while SATA is a direct point to point bus. This makes SATA's protocols "simpler" and typically lower latency.

At this exact point PCI-E can out run the SSD drives so it doesn't hurt it to be there. Even the PCI-E SSD boards still emulate SATA or SAS for compatibility. As much as people say spinning rust is dead, I expect it to have a good decade+ because of capacity leads.

The other main issue with placing storage on CPU is how diverse storage is at the moment and the general utilization. Enterprise is a larger market than home for Intel still. On chip SATA or "Flash ram on the CPU bus" is not likely to be a huge feature and there many different types. There are datacenters with rows of racks with hundreds of servers that have 0 local storage. On chip storage becomes an extra cost when you are using Fiber Channel, iSCSI, pick your poison.

/shrug

William Gaatjes · Jul 18, 2012

imagoon said:
The real issue is cost:benefit. In the android market there is value having a combined system like this as it reduces the over all cost of the device and there was (at the time) a value in a unified approach [less parts]. Also SATA still out performs USB3 in all real world tests so I wouldn't consider it a threat. They are designed for 2 different uses. USB is a device interconnect that needs to bus share while SATA is a direct point to point bus. This makes SATA's protocols "simpler" and typically lower latency.

Cost benefit is a very important factor. But not the only one.
The main reason for high integration SOC for telephones is not only the price. It is mainly power consumption, With a SOC where everything is on the die itself it is much easier to implement power saving techniques. Current features that android (and iphone) mobiles offer, can never be reach with separate chips and a lot of discrete components, to much power consumption and to much pcb real estate. You would have been calling with a 10 inch tablet held to your ear.

Sata may outperform USB3 but that is indeed not strange, and of course is sata lower latency, but that still doers not change anything about the fact that SDD mass storage is now fast enough that it can be directly coupled to the cpu through another PCI-e channel and a smart storage controller unit : SCU.

At this exact point PCI-E can out run the SSD drives so it doesn't hurt it to be there. Even the PCI-E SSD boards still emulate SATA or SAS for compatibility.

PCI-e SSD board doing emulation. No benefit from a direct point to point connection and software emulation layers for compatibility.
That is exactly what i am wondering about... I am sure a lot can be gained from a tighter integration between software and hardware.
A typical application is not aware about where the data comes from unless using another api call to determine what kind of data storage. The only difference is an api to enter the file system or to just point to somewhere in memory while being granted access rights for that part of memory. And the OS is the man in the middle.

As much as people say spinning rust is dead, I expect it to have a good decade+ because of capacity leads.

The other main issue with placing storage on CPU is how diverse storage is at the moment and the general utilization. Enterprise is a larger market than home for Intel still. On chip SATA or "Flash ram on the CPU bus" is not likely to be a huge feature and there many different types. There are datacenters with rows of racks with hundreds of servers that have 0 local storage. On chip storage becomes an extra cost when you are using Fiber Channel, iSCSI, pick your poison.

/shrug

I do not think storage will be placed on typical desktop system for some time in the future. Just to much heat dissipation. But almost all the "intelligent" hardware between the cpu and the mass storage memory could be integrated directly onto the cpu. This may add another few watts that can be power saved.
A lot of desktop/mobile pc features are not useful in the server market.
And a lot of server features are not useful in the desktop/mobile pc market.
For the server market it may be less relevant, but i was never writing about that market. I was thinking primarily about the desktop/mobile pc market.
Although in time the server market would benefit from it as well, as more running but virtualized systems such as multiple servers by definition need more bandwidth.
I do think that just as a gfx card can have a x2 or a x16 pci-e connection, also in the same way SSD mass storage could be implemented that way. More in parallel to hide the latency by providing more "burst" bandwidth. Add an intelligent OS and hardware to do proper prefetching and requesting...
I think it would not be much different as NCQ for HDD drives.

imagoon · Jul 18, 2012

I am not following why you consider emulating SATA an issue. Not emulating a known standard jacks the cost:benefit way up. Moving the SSD to the CPU also would have minimal effect and would move us towards either a "cpu manufacture dictates what is supported" or a much larger on die "SCU" as you call it taking up additional die space.

The SCU would require some sort of protocol (ie the ATA standards) to communicate at the OS level. You would either use the existing ATA spec or build your own and hope that the software people implement it. The SCU would still attach to the the CPU bus. The existing IGP's present themselves via the on chip PCI-E bus. The SCU would likely do the same. Since current SSD tech does not even exceed the limit of a single link PCI-e 3.0 connection. At this point the on chip SCU is still on core but attached via the PCI-e bus.

In reality you may save a few nanoseconds of latency if the SCU was given a dedicated link. However nanoseconds are still not an issue when we have SSD controllers that are still being measured in the ms range.

This would all go out the window if we actually had [long term] storage that was approaching or at the speed of memory. At this point we don't have storage that comes close to the 25GB/s that DDR3 chips can push.

Another not fully fleshed out idea is: SSD tech in NAND attached to a controller. I would suspect you see far more latency in NAND chip operation and controller operation than you see from the over head of the SATA protocols.

When we actually have the tech to produce a need for an onboard storage controller you might see movement that way. At this point, for home use, there isn't a need for it. When we improve NAND performance by a factor of 10 we might just start to see a single PCI-E channel having issues coping. Even a factor of 10 (some where around 6-8 GB/s) RAM will still be 'laughing' at storage performance.

sm625 · Jul 18, 2012

NAND has at least the potential to be just as fast as DRAM as far as reads are concerned. HLNAND2 is 1.6GB/s. This is not bad for one chip. Slap 16 of these onto a couple DIMMS and you're right up there with your typical DDR3 setup.

If you buffer the writes, you can actually run a computer on SLC NAND with no RAM at all, except for your buffer. If windows had any optimizations for systems without DRAM, then such a computer would actually run faster than anything else out there. Of course we all know that microsoft is going to be totally and completely blindsided by this and as a result they will lose billions more.

But anyway this thread was supposed to be about sharing the rather complicated dual channel DRAM bus and packaging a NAND die with a DRAM die and using a multiplexer to route the bus transactions to the correct chip. So the SSD controller must be part of the memory controller, because that is the bus where all memory operations would take place.

imagoon · Jul 18, 2012

sm625 said:
NAND has at least the potential to be just as fast as DRAM as far as reads are concerned. HLNAND2 is 1.6GB/s. This is not bad for one chip. Slap 16 of these onto a couple DIMMS and you're right up there with your typical DDR3 setup.

If you buffer the writes, you can actually run a computer on SLC NAND with no RAM at all, except for your buffer. If windows had any optimizations for systems without DRAM, then such a computer would actually run faster than anything else out there. Of course we all know that microsoft is going to be totally and completely blindsided by this and as a result they will lose billions more.

But anyway this thread was supposed to be about sharing the rather complicated dual channel DRAM bus and packaging a NAND die with a DRAM die and using a multiplexer to route the bus transactions to the correct chip. So the SSD controller must be part of the memory controller, because that is the bus where all memory operations would take place.

Sticking to the multiplex point, I wouldn't imagine the current implementation would work unless you limited it to this HLNAND2. The spec sheet on the site says HLNAND2 has a Jedec interface. The main issue is the fastest they sell is DDR800 (1.6GB/s is full duplex and for the device per the spec page, not per chip) has been dropped. If they built it to DDR3 and could get the performance up to the 25+GB/s DDR3 does now and had some way tell the OS to handle this it could work but you would have contention for the existing bus. It would likely require a hardware solution that was "aware" of this.

I am not sure where you want to go with this?

Also I highly doubt MS will be "blindsided" by this. a) They have done this with phones before b) they are not going to bother to build support for hardware that doesn't exist.

Juncar · Jul 19, 2012

Attaching SSD and RAM to the same bus would increase the parasitic capacitance on the bus. This will slow it down and will require some changes that'll increase the cost.

Combining the two controllers together might require more space since you'll need some kind of a hardware unit to arbitrate the bus usage. Since both RAM and SSD will be accessed often with important bits, I think it'll have to be a fairly smart unit.

sm625 · Jul 19, 2012

Juncar said:
Attaching SSD and RAM to the same bus would increase the parasitic capacitance on the bus. This will slow it down and will require some changes that'll increase the cost.

It wont add any more capacitance than a DRAM bus populated with 2 DIMMS per channel. We will simply be limited to one DIMM per channel, worse case. But even that I doubt. See below

Juncar said:
Combining the two controllers together might require more space since you'll need some kind of a hardware unit to arbitrate the bus usage. Since both RAM and SSD will be accessed often with important bits, I think it'll have to be a fairly smart unit.

The multiplexer is the hardware unit you are referring to. The mux will send the correct signals to the correct chip, either the NAND or the DRAM. It is possible that this mux can also cut down on the bus loading, since the bus side of the mux counts as only one capacitive load even though it serves two (or more) dies in the same package.

imagoon · Jul 19, 2012

sm625 said:
It wont add any more capacitance than a DRAM bus populated with 2 DIMMS per channel. We will simply be limited to one DIMM per channel, worse case. But even that I doubt. See below

The multiplexer is the hardware unit you are referring to. The mux will send the correct signals to the correct chip, either the NAND or the DRAM. It is possible that this mux can also cut down on the bus loading, since the bus side of the mux counts as only one capacitive load even though it serves two (or more) dies in the same package.

This is the approach that was used with FB-DIMMs. The AMR ended up adding heat and latency at the trade off of adding more RAM though. I have a server that actually pulled 45 more watts from the wall per 4 DIMM's because of the AMR. Multiplexers would be doing similar work. I would expect that it has improved in the 7 or so years since that style of DIMM appeared and then disappeared though.

William Gaatjes · Jul 20, 2012

imagoon said:
I am not following why you consider emulating SATA an issue. Not emulating a known standard jacks the cost:benefit way up. Moving the SSD to the CPU also would have minimal effect and would move us towards either a "cpu manufacture dictates what is supported" or a much larger on die "SCU" as you call it taking up additional die space.

The SCU would require some sort of protocol (ie the ATA standards) to communicate at the OS level. You would either use the existing ATA spec or build your own and hope that the software people implement it. The SCU would still attach to the the CPU bus. The existing IGP's present themselves via the on chip PCI-E bus. The SCU would likely do the same. Since current SSD tech does not even exceed the limit of a single link PCI-e 3.0 connection. At this point the on chip SCU is still on core but attached via the PCI-e bus.

In reality you may save a few nanoseconds of latency if the SCU was given a dedicated link. However nanoseconds are still not an issue when we have SSD controllers that are still being measured in the ms range.

This would all go out the window if we actually had [long term] storage that was approaching or at the speed of memory. At this point we don't have storage that comes close to the 25GB/s that DDR3 chips can push.

Another not fully fleshed out idea is: SSD tech in NAND attached to a controller. I would suspect you see far more latency in NAND chip operation and controller operation than you see from the over head of the SATA protocols.

When we actually have the tech to produce a need for an onboard storage controller you might see movement that way. At this point, for home use, there isn't a need for it. When we improve NAND performance by a factor of 10 we might just start to see a single PCI-E channel having issues coping. Even a factor of 10 (some where around 6-8 GB/s) RAM will still be 'laughing' at storage performance.

Here are my thoughts about it :

From an application point of view, it just has api calls to request data aka files. The OS provides these files by using drivers to access the hardware.
These drivers take care of retrieving or writing data.
The SDD or HDD just gets the "address" of where a file is located and retrieves it.
Of course during boot up there is some configuration of the PCI-e controller and devices. Configuration of the sata controllers and devices.
When the flash is directly connected to the cpu through pci-e and with some optimized storage controller, the os does not have to let the drivers do all these extra actions. Since flash can handle so much "IO operations", the OS writers are going to make more and more use of it when possible if not already having a algorithm to push the hardware to the limits.

What i am wondering about, is that now we are reaching a point that data access becomes fast enough from mass storage such as SDD or soon memristor technology that it can influence thread switching policies. Meaning here that it becomes worthwhile to minimize the latency as much as possible.

Now when accessing a HDD drive again, that on cpu storage controller (Imagine it having prefetch abilities just as the cpu cores have with branch prediction and HDD having NCQ)really can shine. Because a few milliseconds ago the cpu wanted data from ram, then from SSD and now from HDD.
The OS is fully aware of what data is retrieved. The OS will in this way be able (in a lot of situations, not all) to predict if the HDD will be accessed and can request through cpu instructions that the storage controller can already start to sending out data requests to the HDD in advance. Minimizing the typical latency of mechanical drives. This can only be done with a tight integration of the mass storage controller onto the cpu directly. If the mass storage can be seen just as pointers to an address, it is much easier to develop an algorithm running as part of the OS that can do data prediction. When there are all these layers in between, there will always be a bottle neck. We are at a point that emulation is not needed, because from an application point of view, it does not care where the data is stored upon or retrieved from. Only the OS is interested about that. And that is the only compatibility that is important. That a current application can within the boundaries of common sense still run on a future generation operating system.

I do not grasp why you want sata compatibility. It is just hardware. The OS takes care of that. That is what the HAL (Hardware abstraction layer)was once invented for.

William Gaatjes · Jul 20, 2012

I do have to point out that when trying to shave of every nanosecond to microsecond of latency, that a full blown HAL implementation might nullify all gained benefits as proposed. So a HAL(Hardware abstraction layer) might not be all that handy when really wanting raw power. We have seen that with the gameconsoles. The original xbox as best example (If i remember correctly).
But again, the only compatibility that really matters, is that the application runs as intended on the executing environment : The Operating System.

http://en.wikipedia.org/wiki/Hardware_abstraction

William Gaatjes · Jul 20, 2012

It took me a while to formulate the words:

For a program wanting to access data on an SSD or HDD or other storage device, it should not have to call all these api functions, that call upon layers on layers to request data that is also addressed as just memory :LBA (Logical block addressing)...

http://en.wikipedia.org/wiki/Logical_block_addressing

For the application, it must just see an enormous addressing range of data after it called an api function to request a certain data file or just a memory range and the permissions to read, modify write or write directly if empty. Of course this can only be done if the master boot record principle is no longer used.
If you plug in another SSD or HDD, the linear addressing just maps it above the existing used address space(which is a lot with 64 bits = 2^64 = 16exabyte(binary) ). It would be as if one is plugging in another memory module. Of course, there will be gaps in the addressing space because if one has installed 8GB of memory and wants 16GB of memory, then this should not get in the address range of where for example the SSD is located.
This would greatly simplify how data is accessed and i am sure that a significant speedup can be reached when done correctly. The only change in the system is that the first SSD module must contain the "file system" or pointers to data lookup table. What is mostly used could be stored in RAM, everything else is on the SSD. Even when accessing a HDD, this would be improve access together with a form of OS induced NCQ(Native command queuing).

http://en.wikipedia.org/wiki/Native_Command_Queuing

I am wondering if shifting around proven techniques from external controllers onto the cpu die(for more fine grained control by the OS) directly to further reduce latency and further reduce access times ? I think it can with a slight overhaul of how mass storage is seen. Not as a separate storage entity. But as a part of the memory.

sm625 · Jul 20, 2012

There is no HAL in the memory controller. There is no SATA. There is no pciE. There is no file system. I'm talking about a memory controller. You're going in the totally wrong direction.

MrDudeMan · Jul 20, 2012

imagoon said:
USB is a device interconnect that needs to bus share while SATA is a direct point to point bus. This makes SATA's protocols "simpler" and typically lower latency.

USB is not a bus electrically, only logically. SATA is subject to the same routing penalties and packeted communication.

William Gaatjes · Jul 20, 2012

sm625 said:
There is no HAL in the memory controller. There is no SATA. There is no pciE. There is no file system. I'm talking about a memory controller. You're going in the totally wrong direction.

I understand exactly what you mean. But until matured memristor technology is available in the sense (HP & Hynix are going to start supplying memristor based memory chips next year) that it provides a memory size of several GB, i think a better way to go is to have separate memory buses for flash memory and for dram. But all may be no longer interesting starting from next year.
If the Mayan calender ending must be prove of anything that is ending, it might turn out to be large scale use of flash memory and dram technology as might as well HDD technology be.

http://www.eetimes.com/electronics-news/4229171/HP-Hynix-to-launch-memristor-memory-2013

SEVILLE, Spain – The 'memristor' two-terminal non-volatile memory technology, in development at Hewlett Packard Co. since 2008, is on track to be in the market and taking share from flash memory within 18 months, according to Stan Williams, senior fellow at HP Labs.
"We have a lot of big plans for it and we're working with Hynix Semiconductor to launch a replacement for flash in the summer of 2013 and also to address the solid-state drive market," Williams told the audience of the International Electronics Forum, being held here.
A spokesperson for HP added that there is no definitive memristor product roadmap as yet, but confirmed that "HP has a goal to see memristor products by the end of 2013."
Williams said that the memristor metrics being achieved, in terms of energy to change a bit, read, write time, retention and endurance, were so compelling that flash replacement was effectively a done deal. "So in 2014/2015 we'll be going after DRAM and after that the SRAM market," Williams said indicating his confidence that the memristor would quickly become a universal memory.
Williams declined to discuss in detail the process technology, memory capacity or memory-effect material that Hewlett Packard and Hynix are working with. "We're running hundreds of wafers through a Hynix full-size fab. We're very happy with it." But Williams did disclose that the first commercial memory would be a multi-layer device.
When challenged over the cost of the technology, which would be the barrier to competing against the high-volume flash memory market, Williams said: "On a price per bit basis we could be an order of magnitude lower cost once you get the NRE [non-recurring expense] out of the way."
The memristor, named after the combination of memory and resistor, was originally a theoretical two-terminal device for which the electrical behavior was derived by Leon Chua in 1971. However, in 2008 researchers from HP published a paper in Nature that tied the hysterical I-V characteristics of two-terminal titanium oxide devices to the memristor prediction of Chua. "What we found is that moving a few atoms a fraction of a nanometer can change the resistance by three orders of magnitude," said Williams. "In fact many nanodevices have inherent memresistive behavior," he said.
HP has amassed some 500 patents around the memristor over the last three years. He also acknowledged that phase-change memory (PCM), Resistive RAM (RRAM) and other two-terminal memory devices are all memristor-type devices. Williams acknowledged that many other companies are working on metal-oxide resistive RAMs. He said that Samsung now has a bigger research team working on the technology than does HP.

Williams touted the cross-point nature of the memristor memory switch or resistive RAM device as a memory capacity advantage over flash memory. "Whatever the best in flash memory is, we'll be able to double that."

Williams compared HP's resistive RAM technology against flash and claimed to meet or exceed the performance of flash memory in all categories. Read times are less than 10 nanoseconds and write/erase times are about 0.1-ns. HP is still accumulating endurance cycle data at 10^12 cycles and the retention times are measured in years, he said.
One of the best things about the memristor memory is that it is a simple structure made using materials that are already common in the world's wafer fabs making CMOS-compatible devices relatively straight forward, he said.
This creates the prospect of adding dense non-volatile memory as an extra layer on top of logic circuitry. "We could offer 2-Gbytes of memory per core on the processor chip. Putting non-volatile memory on top of the logic chip will buy us twenty years of Moore's Law, said Williams.
Further out Williams said the memristor could be used for computation under a scheme called "implication logic" in a fraction of the area taken up in CMOS by Boolean logic. In addition a memristor device is a good analog of the synapse in brain function.
In conclusion Williams stressed that HP would not be getting into the semiconductor components business but would seek to commercialize and then license the technology to all comers.

sm625 · Jul 22, 2012

http://www.eweek.com/c/a/IT-Infrast...hase-Change-Memory-for-Mobile-Devices-520348/

From the article:

The new processors (wafer pictured at left) feature 1G-bit PCM plus 512M-bit LPDDR2 (Low Power Double Data Rate memory, also known as Mobile DDR, or MDDR) in a multichip package.

They are already making chips with DRAM and nonvolatile memory in the same package. This one has PCM. It's at 45nm. Because it is PCM, write speeds should more closely match read speeds, thus simplifying the job of the memory controller.

William Gaatjes · Jul 23, 2012

sm625 said:
http://www.eweek.com/c/a/IT-Infrast...hase-Change-Memory-for-Mobile-Devices-520348/

From the article:

They are already making chips with DRAM and nonvolatile memory in the same package. This one has PCM. It's at 45nm. Because it is PCM, write speeds should more closely match read speeds, thus simplifying the job of the memory controller.

From the article :

In addition, the design-optimizing shared interface between LPDDR2 and PCM is fully compliant with JEDEC (Joint Electron Devices Engineering Council) industry standards, Micron said.

Indeed it does and i agree. I do wonder about that in the near future, the dram will disappear, to be replaced by memristor tech. Thus the pcm (phase change memory - also memristor tech) memory will be a drop in replacement. Being on the same memory interface now already will make the transition more easy. Since it can be accessed as fast as dram this is not an issue. Flash is considerably slower and thus would halt the cpu because of starving the cores from instructions and data when on the same memory interface as dram.

William Gaatjes · Jul 23, 2012

The only way that your flash can be on the same bus as the dram, is to do inter leaving access. Between every flash data access were data is retrieved or written, a few dram accesses can be implemented. When the dram does a refresh cycle, a flash access can be performed. This will ask for a lot more expensive hardware because the bus will has to turn around from read to write and vice versa more often. Also, the chip select lines would be switching a lot faster and less predictable. The memory controller must also be modified internally to be able to buffer flash access times when compared to dram access times.

Flash works with memory pages and thus also has a way of bursting data just as ddr dram does. Only with flash, this can be quite large (up to a few KB) and with ddr3 dram if i am correct , 64 bytes = 8*64bit words. Thus when flash memory is accessed, for some time, the dram is not accessible. Thus it is better to have separate flash and dram buses. Especially when thinking of dma accesses. Dma access and cpu access are concurrent memory accesses. A lot of pc hardware does not access memory by requesting data through interrupting the cpu and asking for data. The hardware directly addresses the memory through dma (direct memory addressing). This mechanism relieves the cpu from having to do simple block transfers of data from or to memory, to or from dedicated hardware.

I do wonder how the micron memory is going to influence the speed of the system.

For the interested :
http://en.wikipedia.org/wiki/Direct_memory_access

sm625 · Jul 23, 2012

Flash is written in pages, but the timing is not critical. A page write can be interrupted any time DRAM needs to be accessed. DRAM should have priority in pretty much every instance.

William Gaatjes · Jul 23, 2012

sm625 said:
Flash is written in pages, but the timing is not critical. A page write can be interrupted any time DRAM needs to be accessed. DRAM should have priority in pretty much every instance.

Then the question of course is how fast can the buffers of the nand flash be turned off. Of course this is not an issue because the same interface can be created just as the ddr3 dram has. But even then, when flash is accessed, the entire page must be read into ram onboard of the flash. Then from that ram, the required address or addresses(for example a burst of 8 addresses) must be selected and the data must be transmitted through the flash data buffers over the memory bus.

imagoon · Jul 23, 2012

William Gaatjes said:
I do not grasp why you want sata compatibility. It is just hardware. The OS takes care of that. That is what the HAL (Hardware abstraction layer)was once invented for.

"ATA" includes a software interface. It is the main reason SATA works still today. "ATA" can be implemented on any number hardware solutions. Examples include FATA (fiber based ATA), AoE (ATA over Ethernet) etc. ATA has already been implemented in RAM [RAM disks etc.]

My biggest thought at the moment is, why take a complex and timing sensitive bus like the RAM bus and make it even more complex by mapping storage on to it with multiplexers and the like. At this time there isn't any NAND that can keep up with 25GB/s of the Intel Memory controller and the Jedec DDR3 standards. Also there is no reason why RAM addresses couldn't be mapped out to other devices no different than PCI-E I/O. If there is a value to support storage via the CPU, I would expect Intel to add another bus to the CPU that is dedicated to that persistant storage chip or just add in extra PCI-e lanes for it. PCI-E 3.0 x16 is already rated to 16GB/s, so it is suited for this.

SamurAchzar · Jul 24, 2012

sm625 said:
A typical x86 cpu has a 128 bit memory controller. A typical SSD has 8,10,12, or 16 channels spanning anywhere from 64 to 256 bits (or more?) For simplicity's sake, lets take a SSD design that is also 128 bits.

So doesnt it makes sense to combine these two very complicated and costly busses? Take a 8Gbit DRAM die and stack a 64GBit flash die and a multiplexer and package them all together into one Hybrid Flash/DRAM chip. Take 8 or 16 of those and you've got a bus.

Then go into the cpu and smarten up the memory controller. Have it do DRAM and flash operations over the same shared bus. Call the new protocol FDDR. DRAM memory bandwidth would take a hit, but imo there is currently plenty of extra. SSD access times would shoot through the roof. A tenfold increase in access times is easily attainable. 50 fold is possible.

There is no reason this cannot be done. Combining the two types of memory into one physical package is an absolute necessity anyway, to prepare us for the next generation of nonvolatile memory. So it only makes sense to combine the two busses now.

The added cost to a cpu and motherboard would be negligable, in terms of transistors and routing.

The added cost to a DRAM chip is a bit tougher to estimate. But at worst it would only be the cost of a typical DDR3 chip plus the cost of a MLC NAND chip plus a few dollars on top of that. So you would be paying roughly $120 for two 4GB/64GB (DDR3/flash) memory sticks.

1. Most of the latency in SSDs is not due to the interface, but due to the internal NAND chip addressing schemes. Unlike technologies like NOR flashes, NAND flashes are not directly bus mapped but operate on page-level access. Furthermore, new NAND types (MLC and beyond) further complicate this with very complicated data access schemes that are done in the controller itself.
Read about Anobit stuff for example (recently purchased by Apple), they run DSP and high end error correction algorithms in order to efficiently and reliably store data on MLCs (which encode more than a bit per flash cell by using variable voltage).

2. Voltage levels are different between the DDRs and NAND chips.

3. Many have already circumvented the bandwidth caps imposed by SATA by using a PCI-E based SSD. Being that PCI-E is simply mapped into the system memory, it's logically pretty close to what you describe (but physically different of course).

4. You can't mix DDR and NAND at the interface level because of the varying waitstates, DDR waitstates are finite (e.g. set as CAS/RAS) and there's no WAIT/BUSY line; NAND is the other way around (you have set timing at the interface level but nevertheless the write cycle may take an arbitrary time during which the host waits).
This is inherently incompatible with the way the CPU memory controller works, even if you put some translator that linearly mapped the NAND into the system address space.

Overall there's no reason to do this being that PCI-E already accomplishes what you're describing and is a much better fit for the purpose.

A split NAND/DRAM bus

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Member

Diamond Member

Diamond Member

Lifer

Lifer

Lifer

Diamond Member

Lifer

Lifer

Diamond Member

Lifer

Lifer

Diamond Member

Lifer

Diamond Member

Platinum Member