With all the interest in CXL lately it’s past time for The Memory Guy blog to weigh in on this new technology. Although I plan to write a little tutorial soon, this post will start with the idea that the reader already has some knowledge of CXL and can appreciate the two conundrums I am about to discuss.
CXL’s magic is mainly that it can add memory to systems in a way that doesn’t bog down the processor with significant capacitive loading or make it burn considerable power by adding memory channels. It does this by allowing the CPU to communicate over a highly-streamlined PCIe channel with DRAM that is managed by its own controller. With the addition of a CXL switch, multiple processors can access the same memory, allowing memory to be allocated to one processor or another the same way that other resources, like processors and storage, are flexibly allocated. In other words, CXL disaggregates memory.
So what are those two conundrums?
1) Will More or Less Memory Sell with CXL?
One benefit of this new technology is that CXL-attached memory presents a miniscule capacitive load to the CPU, since it’s on the other end of a CXL channel, which is a point-to-point connection. A CXL DRAM module has a controller that talks to the CXL channel on one side and to the DRAM on the other side. If a single CXL module includes a large number of DRAM chips, then that controller is tasked with driving all of those chips and all of their capacitive loading, and the processor doesn’t have to worry about it.
DRAM makers see this as a potential boon, since a system built around CXL can support a much larger amount of memory than can be attached directly to a processor. Micron’s March 2023 white paper: Micron’s Perspective on Impact of CXL on DRAM Bit Growth Rate says: “CXL will help sustain a higher rate of DRAM bit growth than we would see without it.” While that’s a relatively mild statement, in conversations with Micron and its competitors Samsung and SK hynix, they describe CXL as a kind of rocket booster for DRAM sales which could cause a knee in the industry’s gigabyte growth curve.
But the end users who are most interested in CXL are the hyperscale data centers, and they have a different perspective. In October 2022, Microsoft and Google published a research paper titled: Pond: CXL-Based Memory Pooling Systems for Cloud Platforms. The paper says: “Our analysis shows that we can reduce DRAM needs by 7% with a Pond pool spanning 16 sockets, which corresponds to hundreds of millions of dollars for a large cloud provider.” A more detailed sentence tells us that: “Our results showed that Pond can reduce the amount of needed DRAM by 7% with a pool size of 16 sockets and assuming CXL increases latency by 222%. This translates into an overall reduction of 3.5% in cloud server cost.”
How can we reconcile these two conflicting opinions? Will CXL increase or decrease DRAM sales? This is the first conundrum.
2) Where does CXL Fit in the Memory/Storage Hierarchy?
Here’s The Memory Guy’s second conundrum. It has to do with how CXL DRAM fits into the memory/storage hierarchy:
-
-
- It’s slower than DRAM because it’s behind a controller, up to two switches, and a network or channel
- It’s more expensive than DRAM because it requires a controller
-
A CXL module is like a DIMM, except that there’s a CXL controller between the DRAM and the CXL channel. This controller adds latency to DRAM accesses, while it adds cost to the module.
A good way to understand the importance of this is to look at the Objective Analysis memory/storage hierarchy chart that we show in a number of our presentations. It’s described in detail in an SSD Guy blog post, so I’ll just show it here without explaining it:
Memory or storage fits into this chart by lying on the same diagonal as the other technologies. Anything new must be cheaper than the next-faster layer and faster than the next-cheaper layer. Each of these orbs fits into this chart very nicely.
Where does CXL fit into this? Since it’s mainly for DRAM in the near term, then it will slow down the DRAM layer while increasing its cost. This is represented by the arrows in the rendition below:
To put it succinctly, DRAM that is behind a CXL controller moves away from the diagonal, so it doesn’t fit into the memory/storage hierarchy.
Astute readers will have already noticed that the Pond paper quoted above states that CXL added a latency of 222% to DRAM accesses, making it over three times as slow. (Admittedly, this number came from a test setup that could very well be slower than tomorrow’s CXL-based systems will be.) Nonetheless, a CXL DRAM module’s controller does add an important amount of latency to CXL-attached DRAM. The more sophisticated systems defined in CXL 2.0 add more latency by directing the signals through a switch, adding additional latency, and the fabric support of CXL 3.0 (below) creates a fabric by directing the signals through two switches, leading to even more latency.
But CXL-attached DRAM can have higher bandwidth than direct-attached DDR5 DRAM because you can put a whole big bunch of chips behind the controller. CXL supporters explain that the added bandwidth more than offsets the increased latency.
Here we get into the challenge of deciding which is more important: Latency or Bandwidth. This all depends on the application program that is being run on the host. Bandwidth-starved programs, those with more predictable data access patterns, will benefit from CXL’s high bandwidth. Programs whose data access patterns are more random will suffer from CXL’s latency.
Some readers may have noticed that this issue is the same one that was noticed decades ago in the HDD world. In the 1980s there were strong arguments in favor of serial data access speed vs. head latency. To resolve this issue a new measurement was devised called IOPS, which mirrored the behavior of a typical application program to measure a blend of the two . Perhaps memory needs a measure like this.
In the mean time the performance of CXL-based DRAM may benefit some applications while reducing the performance of others. This will become apparent as CXL becomes more widely deployed.
I’m writing a report on CXL for release by year-end. One of my biggest difficulties is to decide what stand to take on these two conundrums to provide a forecast that is up to our very high standards. I’d love to hear what others think, so please reach out through the comments section below or by contacting me directly if you have something to share.
Please also let us know if you would like some help setting your own company’s course for a CXL-attached future. Objective Analysis is known for its deep understanding of technical and market issues. We would love to explore ways that we can help your company put together a winning strategy.
This is a good short term analysis. In the longer run CXL allows for non-DRAM memory to compete for that bulk appliance. Direct-attached channels on the CPUs using DDR are very hostile to new technology since the detailed state machine of DRAM is baked into the channel operation. Anything which is not DRAM, having a different state machine, has a huge handicap of needing to emulate an alien solution if it tries to compete for local channels. New technologies simply languish in the labs with no path to commercial use.
However, on CXL no-one knows if you are DRAM. It is a clean transactional semantics for load and store operations, along with supplemental information about coherency and privacy. This creates a new market opportunity where different forms of memory can compete for the same clean load-store semantics. It will take a few years, but that is the true revolution CXL.mem may deliver. Bigger, lower power, denser, and cheaper main memory.
Tanj, you are right, of course! I suspect there’s nobody better to comment on this than you.
It might take a long time for those new memories to come about, but when they do CXL will be there waiting for them, allowing them to be a drop-in substitute for DRAM, with no questions asked.
Thanks for chiming in.
Jim