At the International Solid State Circuits Conference (ISSCC) last week a new “Last Level Cache” was introduced by a DRAM company called “Piecemakers Technology,” along with Taiwan’s ITRI, and Intel.
The chip was designed with a focus on latency, rather than bandwidth. This is unusual for a DRAM.
Presenter Tah-Kang Joseph Ting explained that, although successive generations of DDR interfaces has increased DRAM sequential bandwidth by a couple of orders of magnitude, latency has been stuck at 30ns, and it hasn’t improved with the WideIO interface or the new TSV-based High Bandwidth Memory (HBM) or the Hybrid Memory Cube (HMC). Furthermore, there’s a much larger latency gap between the processor’s internal Level 3 cache and the system DRAM than there is between any adjacent cache levels. The researchers decided to design a product to fill this gap.
Many readers may be familiar with my bandwidth vs. cost chart that the Memory Guy has used to introduce SSDs and 3D XPoint memory. The gap that needs filling is indicated by the large red arrow that appears in this post’s graphic, which is an updated rendition of this graph.
Piecemakers’ new High-Bandwidth Low-Latency (HBLL) DRAM is an eight-channel design, with each 72-bit channel accessing 32 banks of RAM. This high bank count allows a lot of interleaving to support faster access. One of the reasons its latency is low is because much of the complexity of the DDR interface has been stripped away — the chip uses an SRAM, rather than a DRAM, interface. The combination of these gives the part an 17ns latency and a random bandwidth 75% higher than HBM and about ten times that of WideIO and LPDDR4.
Although it would seem that this would require a huge die area to accomplish, the design shares address decoders across bank groups, so the die area penalty is only about 10%. Each bank has local latches to allow the upstream circuitry to progress to the next address without waiting.
So what has all this to do with Intel? Intel was named as a co-author of the paper because the company provided funding for the project. From this it appears that that Intel is not only interested in filling the speed gap between DRAM and SSDs with 3D XPoint memory, but it also wants to fill the red-arrow gap with Piecemakers’ HBLL or something like it.
Of course, we won’t know this for certain until Intel announces its plans, but the Piecemakers development certainly appears to indicate that this is Intel’s intent.
5 thoughts on “Is Intel Adding Yet Another Memory Layer?”
I asked in A DRAM focused session at Intel Developer Foruma few years ago why none of the vendors seemed to care about latency. At the time they were still hyperfocused on capacity and we’re starting to look at bandwidth and DDR1 was the flavor du jour.
They laughed at me. Literally.
Even then we knew that our network workloads were suffering from memory latency. We had plenty of bandwidth and adding more cache was already in the diminishing returns. We needed latency improvements and would be very willing to pay for it in reduced capacity.
I cannot understand an engineering industry that is solely driven by buzzwords and not clear measurement followed by thinking. It makes no sense.
Alan, It certainly took some time for anyone to recognize that your comment was important!
I agree that there are lots of places where people simply don’t measure the problem they are trying to solve. The rise and fall of Fusion-IO attests to that, as does the fact that SSD endurance (Drive Writes per Day or Terabytes Written) kept climbing until 2015, then started to ease back.
It seems pretty normal for designers to solve a problem by over-specifying their needs in the problem area, after which they cost-reduce until the solution is optimized.
They could save a lot of money by measuring their needs BEFORE putting a fix in place.
In management school I was taught that most companies focus on the things that are easy to measure rather than on the things that make the most difference. It looks like that’s what we have here.
Thanks for the comment!
I agree that with the advent of fast, dense non-volatile memories like 3DXP, we can expect the focus to shift to latency first and then performance for all workloads, especially as non-volatile memory migrates ever closer to the compute node (or perhaps even onto the processor itself). Assuming this happens, it will drive major changes in the way software is written to take maximal advantage of this new capability and that, in turn, is going to drive new benchmark metrics. It’s going to be fun to watch as this all unfolds.
There was a time when memory capacity was more important than latency. For database transaction processing, that time has passed and latency is more important even though few people realize it.
I am advocating single processor (socket) systems over blind adherence to the 2-socket practice because the single-socket system has 30% lower avg. mem latency than a 2-socket system mostly due to the high latency of remote node memory accesses. The other contribution is that even local node memory access on the 2-socket system has higher latency than single socket memory access because of the remote node L3 cache check.
A single socket system with 20-cores is about performance equivalent to a 2-socket system with 14 cores per processor. Note the software licensing is typically on a per-core basis and is far higher than hardware costs.
Many people routinely configure 512GB to 1TB of memory in their database servers. I am speculating that perhaps 16-32GB of super low latency memory (this could be SRAM, HBLL, or RL-DRAM), and the current level of conventional DRAM will reduce average memory latency significantly. A 2X reduction in latency would enable a single processor to match the performance of current 4-way systems. Given that the Intel Xeon SP 28-core processor is $10K each, then $30K is the budget to achieve 2X performance gain with low latency memory. So even SRAM to the tune of $1000/GB could be very attractive.
I have written a detailed analysis of this approach here: http://www.qdpma.com/ServerSystems/ServerSystems.html
Joe, That’s an interesting insight.
Given that you’re an SQL server consultant you have insights beyond mine – I focus only on memory.
I always assume that the processor choice has already been made, and then wonder how to provide the best cost/performance memory system for that processor. You take a broader (and much better) approach!
Thanks for the very valuable comment!
Comments are closed.