What really happens in NAND flash during an MLC, TLC, or QLC write? Although there are lots of websites that explain that multilevel cells store four, or eight, or sixteen different voltage levels on a cell (for MLC, TLC, or QLC), they don’t spell out the process of putting those voltage levels onto the bit cell.
Fortunately, Vic Ye, Manager, NAND Flash Characterization at Yeestor Microelectronics Co., Ltd. in Shenzhen, China presented the programming process in a series of short videos at the Flash Memory Summit last August. The Memory Guy was fortunate enough to attend his presentation. Yeestor is a fabless semiconductor manufacturer that manufactures flash storage controllers for SSDs (PCIe & SATA) and flash cards (SD, UFS, eMMC, etc.)
Mr. Ye later gave me permission to share his videos and these are the foundation of this post. They’re brief (13 seconds to 1:10) so they won’t take much time to review. The videos were a part of his slide presentation titled: A Graphical Journey into 3D NAND Program Operations that can be downloaded from The Flash Memory Summit website by clicking the presentation title above and entering your e-mail address.
A multilevel flash bit cell has a number of voltage levels separated by thresholds, as illustrated in the graphic above. If you click on the graphic a larger version will appear and you can see the legend off to the right-hand side. The legend is pretty small, but it calls out levels L0-L7. In one of the videos the thresholds for a TLC chip are about 38 millivolts apart. A cell with less than zero volts is L0. If it’s between zero volts but below 38 millivolts it will be read as L1, and a cell with a voltage between 38 millivolts and 76 millivolts will be read as L2. If it’s between 76 millivolts and 114 millivolts it is read as L3, and so on. The question is: “How do you get these voltages onto the cell?”
The most direct approach is to program a cell with a series of pulses that will increase the cell’s voltage by an amount smaller than the difference between these thresholds. It’s kind of like tapping gently on the side of an object on a table to make it slide into a very precise position. After every pulse the state of the bit cell is read. If it has not reached a point between the correct two thresholds then the bit is hit with more pulses until the proper voltage is reached.
This is called “One Shot Programming” and is illustrated in the first video: TLC One-Shot
The video shows two graphs with two representations of the same data. The top one uses linear Y-axis values and bottom one is logarithmic. Pick one that you like and compare it to the same graph in the other videos.
The horizontal X-axis in these graphs is the threshold voltage of the bit cell – this is the voltage that determines what is programmed into the cell (L0, L1, L2 etc.) The vertical Y-axis shows the number of bits that are at a certain voltage. In a perfect world each of the bell curves in this chart would be infinitely narrow since all bits would be programmed to precisely the right threshold voltage. In the real world there are bits that are slightly higher and bits that are slightly lower than intended, making up a bell curve.
Note that each level is given a different color: L0 is green, L1 is yellow, etc. There are also narrow spikes that the legend tells us represent “Vread.” These spikes are only used to illustrate the limits between the bits. Every bit that is supposed to be programmed to L2 (for example) must lie between the spikes at 76 and 114 millivolts.
At the beginning of the video all of the cells are erased to a threshold voltage below zero. As they are programmed they start to migrate to the right. The bits that are meant to be programmed to the L1 state reach a spot between 38-76 millivolts and then no more pulses are applied, so the yellow L1 bell curve represents the bits that were programmed to L1. All of the other colored curves continue to move right, with each one stopping once all of its bits have reached the correct voltage level.
There are other approaches to programming the multiple levels on TLC flash. While the first video illustrates One-Shot programming, Yeestor’s second video represents a type of Two-Pass programming algorithm called 4-8, since it pre-programs bits into 4 levels (including the Erased state) before moving them to their exact positions. The pre-programming job can be accomplished using a stronger program pulse since it’s less precise.
In this video you can see that the bits that will eventually be programmed to L2 and L3 are initially programmed to a point between L1 and L2.
Another Two-Pass approach called 2-8, which appears in this video, takes a slightly different approach by pre-programming the bits that will eventually be programmed to L4-7 to a level between L3 and L4. The lower bits are then programmed to L1-3 levels and the pre-programmed bits are moved into their proper positions for L4-7.
For QLC cells, since there are more levels, there’s a Two-Pass approach called 8-16, which is similar to the 4-8 TLC programming algorithm. This is illustrated in Yeestor’s fourth video.
Mr. Ye tells us that Two-Pass programming is much more complex and slower than the One-Shot approach. Yet, he points out, a Two-Pass approach is used for the Intel/Micron floating gate 3D NAND flash while the One-Shot algorithm is the way that other companies program their charge trap 3D NAND. Why is there a difference?
It seems that the floating gate flash can be moved into tighter bit distributions with a Two-Step approach, which Ye’s presentation illustrates with this comparison. Note that the bell curves overlap the Vread spikes and adjacent bell curves in the top One-Shot chart, while the curves in the bottom Two-Pass chart don’t overlap.
Yestor defined a shortcoming with the Two-step approach that Ye illustrated with another set of curves.
Certain bits end up residing in a different bit’s range with this approach. Careful examination shows that these erroneous bits are inverted from where they should be, as is illustrated below:
It’s easy to see that the colors at the base of each arrow match the colors at their tips.
Mr. Ye calls this an HRE (High Reliability Error) and says that it is the key shortcoming of the Two-Pass approach, and is especially harmful for LDPC soft bit decode. He explains that the error stems from the fact that the second pass of the programming cycle is based upon a read of the result of the first pass. This introduces some bit errors that are expanded by the second-pass program operation.
The solution is to re-input the first-pass data during the second pass instead of reading the contents of the pre-programmed cells. Ye showed that this approach eliminates the problem.
I found the Yeestor presentation to be very interesting, and thank Mr. Ye for sharing his videos with my readers. He has offered to answer any readers’ questions directly. Click HERE to send him an e-mail.