What Memory Will Intel’s Purley Platform Use?

Part of Intel Purley SlideThere has been quite a lot of interest over the past few days about the apparently-inadvertent disclosure by Intel of its server platform roadmap.  Detailed coverage in The Platform showed a couple of slides with key memory information for the upcoming Purley server platform which will support the Xeon “Skylake” processor family.  (A review of this post on 7/13/17 revealed that The Platform’s website has disappeared.  The above link and the next one no longer work.)

One slide, titled: “Purley: Biggest Platform Advancement Since Nehalem” includes this post’s graphic, which tells of a memory with: “Up to 4x the capacity & lower cost than DRAM, and 500x faster than NAND.”

The Memory Guy puzzled a bit about what this might be.  The only memory chip technology today with a cost structure lower than that of DRAM is NAND flash, and there is unlikely to be any technology within the leaked roadmap’s 2015-2017 time span that will change that.  MRAM, ReRAM, PCM, FRAM, and other technologies can’t beat DRAM’s cost, and will probably take close to a decade to get to that point.

Since that’s the case, then what is this mystery memory?  If we think of memory systems, rather than memory chips we can come up with one very plausible answer.  Intel may be very obliquely referring to Diablo’s Memory Channel Storage, or some similar approach.

Diablo’s approach, which places NAND flash on the memory bus, provides a larger amount of memory on a single DIMM than can be done with standard DRAM at a lower cost per gigabyte than DRAM, and its architecture supports higher bandwidth than is available through most NAND flash interfaces.

Although Diablo’s current customers include only SanDisk, with its ULLtraDIMM, and IBM, which sells the SanDisk product rebranded as the eXFlash DIMM, Diablo has repeatedly asserted that it is working with other vendors to increase the adoption of its technology.  The Purley slides appear to indicate that Intel has decided to promote the approach.

This would be quite an accomplishment for Diablo, but it fits with a trend that Intel has been supporting for a number of years: to bring flash into the platform in order to unleash the processor’s capabilities.

You can safely bet that I will be watching the Purley platform closely for further disclosures of Intel’s memory plans!  As I learn more I will share what I can.

 

30 thoughts on “What Memory Will Intel’s Purley Platform Use?”

    1. Now, for some “Back of the Envelope” math.

      There are six DDR4 channels per CPU. Purley supports only up to 2 DIMMs per channel. That’s 12 DIMMs.

      Purley can only support 1.5TB of DRAM without Apache Pass. That’s 128GB per DIMM, which is a lot, but Samsung said it is ready to build DIMMs that large last October.

      If you remove half of those DIMMs and replace them with 800GB Apache Pass modules you get to 5,568GB of memory. It’s fair to round that to 6TB.

      800GB is twice the capacity of the larger of the two ULLtraDIMMs currently shipped by Diablo customers SanDisk & IBM.

      It all fits together.

      1. Thanks, Rob, for finding the correct link.

        I would edit it in my comment above, but can’t figure out how to do that.

        I notice, too, that the links to “The Platform” in the first two paragraphs of the blog post no longer work. Guess that publication has failed.

        Jim

  1. This is almost certainly STT memory that will come out of the Intel/Micron partnership. Micron mentions “new memory B” will be volume capable in 2017

    1. Sorry, Derek, but I can’t see how something that hasn’t even sampled yet can attain a lower price then DRAM in only two years.

      NAND flash took about a decade to get there. Several other technologies have tried and failed.

  2. I’m just basing it on what Micron and Intel are saying. Will it be cheaper than DRAM in the first gen? mb not. But according to this slide from Micron, http://www.reram-forum.com/wp-content/uploads/2015/04/New-Memory-300×220.gif , the performance focused memory is supposed to be faster than NAND, cheaper than DRAM, and persistent. If STT lends itself to be built 3D like NAND but at an even bigger node since it is only 4x density of DRAM it is potentially possible that it could be produced at lower cost out of the gate.

    1. Derek, you could be right, but I am pretty jaded after having seen a lot of other hopeful memory technologies fail to displace the entrenched competition.

      Note, too, that I added a couple of comments below the post that point to the likelihood that the “Apache Pass” memory is an 800GB module, which is likely to be the next-larger capacity for SanDisk’s ULLtraDIMMs (or IBM’s eXFlash DIMMs), since today they ship in 200GB and 400GB capacities.

      But a difference of opinion makes anything more fun to watch. Let’s keep an eye on this and see who was right and who was wrong.

  3. DRAM is already faster than NAND.

    So putting NAND in memory modules buys you
    faster storage and more capacity because you are bypassing
    PCI.

    SSD modules already have DRAM for cache.
    so what happens to SSD controller.

    sounds like only good for BigData processing applications.

    1. Interestingly enough, NAND on the memory bus is good for computing of all sorts, not only Big Data.

      Objective Analysis ran a series of PC benchmarks some time back that found that, after the first gigabyte or two of DRAM was installed, you got a greater performance boost by adding a dollar’s worth of NAND or SSD than if you added a dollar’s worth of DRAM.

      1. – [ ] fetch from L1 cache memory 0.5 nanosec
        – [ ] fetch from L2 cache memory 7 nanosec
        – [ ] fetch from main memory 100 nanosec
        – [ ] SSD random read 150,000 nanosec
        – [ ] Read 1 MB sequentially from SSD 1,000,000, nanosec

        These figures are few years old but
        modern OS is designed to use as much memory as available.

        If this what you want then take HMC control logic
        and stick it in all the SSD you can fit then you
        get the best of the both world. That should be minimal
        in space as well.

        1. I’m not sure how the 1MB sequential read ties in with the other numbers, but those numbers are still about right.

          You didn’t include HDD, which takes about 1,000 times as long as an SSD.

          Your HMC proposal is intriguing, but I think that Intel plans to use technology that is already shipping.

      2. It sounds interesting. But, it is hard to believe that just 1GB~2GB DRAM backed by large NAND capacity at PC-level system greatly boosts application performance all the time. It must be dependent on working set size of workload. Isn’t it? Jim, can you tell me more about the results from Objective Analysis? I am wondering how they drew the conclusion with which system configuration and with which workloads. Does the report also have the cost-effectiveness analysis model to justify their argument?

        1. Jjay, Thanks for the thorough questions.

          Your questions are well-founded. The answers to them are all in the report.

          My next post explains this.

          Jim

        2. Jjay,

          The report present the benchmarks’ findings in two ways: Performance as a function of DRAM and NAND flash size, and performance as a function of cost.

          The cost analysis basically says: “For a combined memory/storage cost of $X you can get this much performance with a flash-heavy approach, and that much performance with a DRAM-heavy approach.” Of course, the report puts real numbers around this, but I didn’t include them here.

          In all cases, though, for a fixed-cost system, performance increases as you decrease DRAM and increase NAND once the DRAM reaches a certain relatively small size.

  4. Thank you for sharing an interesting article. It would be interesting to understand the load/store access latency of the ‘Apache Pass’. Does it make sense to introduce something slower (think random access!) than DRAM on the memory bus ?. What % of time does a processor spend waiting for data to load from DRAM ? Does Diablo approach solve these issues ?

    1. The Diablo module appears to be adjustable to some extent, since IBM specifies a tighter write latency specification (3.3µs) than does SanDisk (5µs).

      All of SanDisk’s specifications appear at http://www.sandisk.com/enterprise/ulltradimm-ssd/

      As for Apache Pass, I know far less. We will just have to wait and see what Intel tells us once they are ready to announce.

  5. Harish’s question is on point: Much has been made of the I/O performance benefits of putting (relatively slow) nonvolatile media on the fast, fixed-latency memory bus. I suspect Intel’s architects get a good chuckle out of this.

    For any workload that doesn’t fit in L3, modern server CPU’s are badly memory-bound. This excellent paper: http://www.usc.edu/dept/ee/scip/assets/001/56439.pdf shows that for TPC-E, a Xeon experiences an L3 miss rate of only 1.5%, but nonetheless spends 19 out of every 20 instruction slots stalled waiting for DRAM! The limit on the number of cores in a modern server CPU is simply that each additional core adds computational capacity but uses memory bandwidth and so starves all the others. At some point the marginal return is negative. Intermixing multi-microsecond accesses with DRAM traffic can only make this worse. It will doubtless result in great I/O benchmark scores, but for many workloads will kill the overall performance of the machine, and Intel clearly knows this.

    If the load/store latency of Apache Pass technology is fixed (like DRAM) and relatively fast, then sharing the DRAM bus may make very good sense. If not, expect Intel to either enhance PCIe, attach it to QPI, or add some other port to access it without disrupting memory traffic. The key, as Harish implies, is that the latency requirement is less about making I/O fast than it is about staying out of the way of DRAM traffic.

    1. Joel,

      Good point. The idea of adding NAND to the DRAM bus is to cut the latency of going to NAND through an HDD interface. Some have tried to do this by stalling the DRAM bus altogether to wait for the NAND access, but the Diablo approach (as I understand) is to DMA the NAND data into DRAM once a page fault has been determined.

      As you point out, either of these will reduce DRAM bandwidth to the CPU, and this must be traded off against the alternative, which may hang up other parts of the process. There is no doubt that there will be certain applications for which a NAND DIMM will be a great choice, and others for which it won’t be useful. We’ll learn more about which is which over the next few years.

    2. Joel,

      Good insights. Since my position on Apache Pass is nothing more than a guess right now, I will put off further elaboration until Intel discloses more.

  6. Hey guys, I came across this post after reading a rather lengthy and what seems like a well researched article on Seeking Alpha. I know very little about memory so I would greatly appreciate your input. The author claims that Intel could use Micron’s PCM memory in order to achieve what you guys are discussing above. If you have a chance, please take a look at the article and let me know if you think it makes sense, it’s possible but a longshot, or just pure nonsense. Thanks!

    http://seekingalpha.com/article/3253655-intel-and-micron-the-purple-swan

    1. Mr. Moe,

      Thanks for the comment and the link.

      I was apprehensive that it was on Seeking Alpha, and it unfortunately met my expectations. Most of what I read on that site consists of lengthy discourses steeped in conspiracy theories covering misunderstood technologies and markets using overly-prosaic language laced with spelling and grammatical errors.

      In brief, I don’t buy it. If you’re looking for investment guidance, I would suggest for you to look to other sources.

  7. Apache pass on a client machine can improve the user experience compared to PCI/SSD. Faster boot times, no more sleep mode; instead instant-on from zero power hibernate and zero time hibernate, supercap RAM backup. Nothing is faster than DRAM, but Jim’s premise is about price/performance, not raw performance.

  8. I look forward to your take on the just announced XPoint memory. I believe it answers the questions in your recent article about Intel’s memory plans?

    1. DS, I posted something just a few hours ago here: https://TheMemoryGuy.com/micronintel-3d-xpoint-raises-more-questions-than-answers/

      It’s hard to tell whether or not 3D XPoint has anything to do with Purley & Apache Pass.

      My guess is that this new chip is too new to play a big role in any pending platform. It usually takes at least 5 years to bring any new semiconductor technology to the point that it meets its cost goal. Most of the time it takes far longer.

      My prediction is based on the cautious assumption that NAND will be abundant and cheap, but that 3D XPoint may take quite some time to actually become cheaper than DRAM.

      Jim

  9. I just wondered what you thought of rram as touted by crossbar. It looks more cost effect, less power hungry and can be packed in higher density than any of it’s competition. I can see some drawbacks but every system has at least one drawback (some more).

    1. Michael, Thanks for the comment.

      Crossbar isn’t alone in having a good technology. There are lots of very good alternative technologies out there, all poised to change computing as we know it.

      The problem for any of them is getting into volume production. Until they reach volume they will be too expensive to make a difference, and if they are too expensive then they won’t get into volume production.

      I like to say that “The road to hell is paved with technologically superior products.” Technology is not the issue here. The issue is cost.

Comments are closed.