Using ECC to Reduce Power

CMU Most DRAM Refreshes UnnecessaryA couple of papers at last week’s ISSCC (the IEEE International Solid-State Circuits Conference) caught The Memory Guy’s attention.  Both SK hynix and Samsung showed low-power DRAM designs in which the refresh rate of the DRAM was reduced in order to cut power consumption, with ECC applied to correct the resulting bit errors.

Although I had not heard of this approach before, I have recently learned that researchers at Carnegie Mellon University and my alma mater Georgia Tech presented the idea in a paper delivered at another IEEE conference in 2015: The International Conference on Dependable Systems and Networks.

Here’s the basic concept: DRAM consumes most of its power performing refresh cycles, the issue for which it was given the “Dynamic” part of its name: Dynamic Random-Access Memory.  This use of the word “Dynamic” is a euphemism.  In reality the bits are constantly decaying, but that doesn’t sound as nice.

When the technology was developed in the early 1970s DRAM manufacturers offered to provide their customers with really inexpensive RAM bits (compared to SRAM) as long as the customer was content to refresh the entire array every 64 milliseconds.  This means that every DRAM bit has to be read, evaluated, and pushed back to its original state, a 1 or a 0, every 64 milliseconds.  This is a very manageable approach, but it consumes a lot of power.

Certain forward-thinking researchers recently looked at this situation and said: “What if I don’t?”  What if they didn’t refresh every single bit as often as was required?  Would the entire contents of the DRAM be lost if the refresh rate was slowed to, say, 65 milliseconds?

In a word: “No.”

The top graphic in this post is CMU professor Onur Mutlu‘s visualization of the way DRAMs really work.  The tiny blip in the upper left corner represents the share of cells that must be refreshed every 64-128 milliseconds.  These are the lowest element of a DRAM chip.  Below that is a column representing more forgiving cells that can be refreshed every 128-256 milliseconds.  These are weaker than most cells, but not the worst of the bunch, as are the 64-128 millisecond cells.  The vast majority of the cells fall into the >256 millisecond category.  That is to say, that almost all DRAM cells can be left alone for 256 milliseconds or longer without being refreshed, and their data will still be good.

Another way of looking at this is shown below.  In this chart, from the CMU paper: RAIDR: Retention-Aware Intelligent DRAM Refresh, a log scale of the refresh interval on the X-axis is plotted against the probability of bit errors, also logarithmically plotted, on the Y-axis.  The chart on the left covers the range from 10 milliseconds to 10,000 seconds (about 3 hours).  The chart on the right is a subset of this chart, the part circled on the left, ranging from 10 milliseconds to 1 second.

CMU DRAM Refresh Rate vs Bit Failures

These charts show us that there would be about 30 bit errors in a 32GB DRAM array if the refresh rate was slowed to 128 milliseconds, and the number of bit errors would climb to 1,000 if the refresh were slowed to 256 milliseconds.  These are manageable numbers.  Eventually the number would be large enough that the ECC would drive costs prohibitively high.  The trick, then is to refresh often enough that the error rate can be economically corrected.

In the ISSCC papers SK hynix slowed the refresh period to 256ms and used ECC to cut self-refresh current by 75% to less than 100µA.  Samsung slowed its refresh time to 384ms and used a different ECC approach, combined with a number of other techniques, to reduce standby power 66% to 0.15mW.  Those are pretty impressive power savings!

How economical is it to correct these bits?  The amount of logic required varies as a function of the number of bits that need to be corrected.  The more errors you correct, the more logic you will need.  Quite fortunately, that logic is constantly undergoing Moore’s Law cost reductions – what may have been too costly yesterday is very economical today.

When discussing NAND flash, I have been known to quip that ECC has been advancing at such a rapid rate that future controllers will be able to pull valid data out of non-functioning NAND flash chips.  In light of this work I may have to add DRAMs to that comment, and postulate that future ECC may be able to keep anyone from ever having to refresh DRAM chips at all.

Seriously, that’s not going to happen, but it is interesting that ECC can be used to reduce refresh rates, and therefore power consumption.  The stronger the ECC, the slower the refresh rate can be.  The slower the refresh rate is, the lower the lower consumption will be.  That’s pretty amazing!

9 thoughts on “Using ECC to Reduce Power”

  1. It’s an interesting concept, but not new. With the increases in VRT (Variable Retention Time) bits that comes with advanced process nodes this sort or solution may be required regardless of potential power savings. DRAM array architecture is at the limits of what can be done with the fabrication process. The new memory types can’t get here soon enough!

    1. Thanks Michael.

      Although ECC on DRAMs may not be new (one reader told me about a 2007 paper on a similar concept) it may finally have become economical.

      The use of ECC technology for power savings also appears to be relatively new.

      Whether or not it’s new, it’s still interesting. Adding complexity to reduce power consumption never seems intuitive.

      Jim

      1. ECC to save power dates back (at least) to the 32MB CellularRAM/PSRAM era which was used in the Motorola Razor phone. Before smartphones came along the standby power was everything and a major selling point, so saving refresh cycles during self-refresh was a huge win. The 1st version of the PSRAM had ECC just for this purpose, later, when the phone lost it’s panache they dropped the ECC in order to cost reduce it, standby life became immaterial.

        1. 2004 RAZR
          Type
          Specification
          Modes
          CDMA 850 / CDMA 1900
          Weight
          3.49 oz (99 g)
          Dimensions
          3.90″ x 2.10″ x 0.60″ (99 x 53 x 15 mm)
          Form Factor
          Clamshell Internal Antenna
          Battery Life
          Talk: 3.33 hours (200 minutes) Standby: 215 hours (9 days)
          Battery Type
          LiIon 740 mAh

          1. Well THAT’S interesting! I hadn’t heard of the approach until only recently.

            Now you have me wondering just how long people have been doing this!

            I welcome any others to give us similar, but earlier, examples.

            Jim

  2. As Michael Sporer already mentioned, VRT effects are increasing with the advance of the DRAM manufacturing-processes. This being said, I see ECC as a requirement for any application that is expected to run stable, period.
    The idea to take ECC to reduce power by saving refresh cycles is a nice side-feature which some people might want to use anyhow when they have an on-chip ECC in their DRAM, but I don’t think the power consumpion is anywhere near the importance of system-stability.

    In 2014 we made an experiment with Intelligent Memory 2Gb DDR2 components. We took a handful of parts into the Advantest-testers and set the chamber to 95°C at first. The refresh rate was set to 128ms and we ran a complete test-run over all memory-cells of the parts with a few patterns. This takes about 10 minutes.
    All DUTs survived the test without any memory-cell flipping.
    Next, the temperature was increased to 105°C. Within the 10 minutes, only one of the devices showed 2 single bit errors while all the others still did not show any problem
    At 115°C all devices under test had between 5 and 120 transient single bit errors. And at 125°C there were between 200 and 6000 bit-errors.
    Noticeable is the fact that ALL bit-fails that were found in the 2Gbit memory-array have been “single-bit” fails. None of them were Wordline, Bitline, multi-bit or just bit-pair fails. That means all errors would have been correctable by ECC!

    6000 bit-flips sounds a lot, but let’s be realistic: 1 Gigabit equals 1 billion bits. Using a standard 64/72 ECC hamming code would mean that these 1 billions bits are split into 16 Million fields of 64/72 bits. In each of these fields one bit-flip can be corrected.
    In short: 16 Million errors could be ECC corrected in parallel. Or in other words: At an error-rate reaching around 16 Million bit-flips the probability for an uncorrectable double-bit error is getting too high.
    But the 6000 bit-flips found in the experiment at 125°C and 128ms refresh-rate are “peanuts” for a 64/72 ECC algorithm.

    This experiment has not been very representative though. The test time was much too short to locate VRT effects or to see any degredation issues or soft-errors induced by natural radiation. These issues alone are enough reason to use ECC in any application, no matter if a test reveals no error. You can be 100% sure that there will be SEUs (single event upsets) with databits in any DRAM of any brand if you run it for some time, even at room-temp and with fast refresh-rate.
    But we learn from the experiment that ECC can help to achieve a very safe operation not only in normal industrial environments, but even at very high temperatures!

    As you may know, Intelligent Memory has a whole line of DRAM components with integrated on-chip ECC. Technologies include DDR1, DDR2, DDR3 and LPDDR. Just now they also released SDRAM with up to 512Mb in x16 and x32 with ECC.

    But there is another method to improve the retention-time of the memory cells: Cell-Twinning, also sometimes called Dual Cell. Intelligent Memory calls it XR for eXtra Robustness.
    Two memory cells are together being used to store the charge for one databit. As a result, the retention time goes sky-high, to over 2000ms (yes, two seconds) or more. Intelligent Memory combines their XR with ECC, meaning you could safely set your refresh rate to something around 2000ms or even to 5000ms. If bits flip, ECC will correct them! THIS will save power 😉

    Regards,
    Thorsten

    1. Thanks, Thorsten.

      Your reply is lengthy, but it’s well worth reading because you provide a lot of really good data. Because of that I am OK with the little plug for your company, which I usually try to avoid.

      Your company’s focus is on larger systems, so power is much less of a concern to you than bit errors. The SK hynix and Samsung parts were aimed at smartphones where power consumption is a very high priority.

      In the early 1990s IBM briefed me on its new ECC DRAM modules. The representative explained the need for ECC very well by saying: “When a bit error crashes a server it can cause millions of dollars of business loss, but when a bit error causes a PC to crash is does little more than improve the user’s vocabulary.”

      Best,

      Jim

  3. We’re working on embedded system deep learning (DL) models. I would like to ask Michael and Thorsten a couple of questions, as they seem to be key personnel at companies leading the way towards lower energy external memory (MoSys and Intelligent Memory).

    Publications such as http://www.eng.utah.edu/~cs7810/pres/14-7810-02.pdf (slide 8) give figures around 4 pJ/bit for proposed serial memory (HMC) and 40 pJ/bit for LPDDR. DL model compression literature, for example https://arxiv.org/pdf/1510.00149.pdf (page 2), likewise give figures around 20 pJ/bit for external DRAM.

    Do such figures typically include both energy required for CPU access and the DRAM itself; i.e. both sides of the “transaction” ? Do you guys know of some links that give a breakout of energy use by both sides ?

    DL models are memory intensive. There are various model compression methods being researched, but many of these are mathematically complex and not “biologically plausible”, for example Huffman coding of neuron weights. My feeling is that AI applications will become a driving force for ultra low power external memory with very large size. High performance / bandwidth is not crucial, for example an embedded AI system SoC might have two banks of DRAM, one for CPU purposes and one for convolutional neural net (CNN) purposes, with the latter being 10x or less power consumption but also 10x or more slower. If there are actual serial DRAM devices (including beta) that we can test, please let me know. For example if we could hook one up to the SRIO interface on the CPUs and SoCs we’re working with, or insert a FPGA with a SERDES conversion.

    Thanks, Jeff

    1. Jeff,

      Pretty interesting stuff you’re working on.

      Let me try to put the two other commenters in touch with you.

      Jim

Comments are closed.