This week’s HotChips conference featured a concept called “Processing in Memory” (PIM) that has been around for a long time but that hasn’t yet found its way into mainstream computing. One presenter said that his firm, a French company called UPMEM, hopes to change that.
What is PIM all about? It’s an approach to improving processing speed by taking advantage of the extraordinary amount of bandwidth available within any memory chip.
The arrays inside a memory chip are pretty square: A word line selects a large number of bits (tens or hundreds of thousands) which all become active at once, each on its own bit line. Then these myriad bits slowly take turns getting onto the I/O pins.
High-Bandwidth Memory (HBM) and the Hybrid Memory Cube (HMC) try to get past this bottleneck by stacking special DRAM chips and running data buses with thousands of bits vertically down to a logic chip on the bottom that can drive as many as 1,024 I/O pins with great speed. That’s a help, but it’s still much slower than the internal bandwidth of any of the DRAM chips in the stack.
The concept behind PIM is to build processors right into the DRAM chip and tie them directly to all of those internal bit lines to harness the phenomenal internal bandwidth that a memory chip has to offer. This is not a new idea. I first heard of this concept in the 1980s when an inventor approached my then-employer, IDT, with the hopes that we would put a processor into one of our 4Kbit SRAMs! Even in those days a PIM architecture would have dramatically accelerated graphics processing, which was this inventor’s goal.
The Memory Guy also published a blog post almost eight years ago about a company named Venray that was trying to convince various DRAM makers to build a DRAM-based PIM chip. A recent conversation with Venray indicated that they have finally made progress in this direction and will soon have some good news to share.
Back to UPMEM: The company’s name seems to have stemmed from a fusion of the shorthand term for microprocessor “µP” with an abbreviated form of the word “Memory”, but with a “U” being substituted for the “µ” in “µP”. Some readers may never have seen the term “µP” since it has largely been replaced by “MPU” in recent decades.
I asked UPMEM’s management about the differences between their approach and the two alternative architectures I mentioned above: The TOMI from VenRay and the Automata processor that Micron recently spun out into a new company called Natural Intelligence Semiconductor. (The Memory Guy also published a post on the Automata processor nearly six years ago, in 2013. Did I mention that this idea has been around a long time?) They explained that while all three approaches harness the enormous internal bandwidth of a DRAM chip, the processor architectures were very different, each with a different goal:
- Automata is a programmable sea of gates that is extremely powerful but because of that it is very challenging to program. It was designed to solve extremely complex algorithms, including NP-hard problems: computing tests of the ultimate difficulty.
- TOMI inserts a very powerful processor into the DRAM with the goal of allowing the user to replace an expensive server processor with a much more modest and less costly CPU by offloading the bulk of the work to the TOMI chips.
- UPMEM combines a modest RISC processor with DRAM aiming at accelerating certain very specific tasks that can be offloaded to the DIMM to reduce the server processor’s load while reducing a lot of the traffic on the memory channel.
Automata is typically inserted into a server on a PCIe add-in card. UPMEM uses standard-format DIMMs (as shown in this post’s graphic – click to enlarge) to support higher bandwidth between the memory and the server processor. When the PIM’s processor is not being used, these DIMMs behave as standard DRAM.
All three solutions are either currently available (Automata) or are very close to becoming available (TOMI and UPMEM), and all three appear to provide an excellent way to accelerate specific problems in the data center while significantly reducing overall costs.
The timing for these companies is good: During a DRAM shortage (like the one we had in 2018) mainstream DRAM makers are unwilling to devote even a small portion of their production capacity to a risky product since they can already sell more standard DRAM than they can produce. In oversupplies like today’s these companies are much more willing to try something new to make good use of idle capacity. Meanwhile the hyperscale data centers have abundant cash and are extremely interested in testing new concepts that are alternatives to conventional computer architecture. This is how AI has recently risen to prominence.
Those who would like to know more about PIM technology might want to visit UPMEM’s “Use Cases” page with its three white papers, the Venray “Strategy Papers” page which has hotlinks to 79 works, or the Natural Intelligence “Research” page, with its 31 links to research work that has been performed with their chips. Readers who would like to gain a feel for the business and economic factors that will make these chips succeed or fail should contact Objective Analysis to learn how we can help you make the right business decisions regarding this rising technology.