On January 22 Processor-In-Memory (PIM) maker UPMEM announced what the company claims are: “The first silicon-based PIM benchmarks.” These benchmarks indicate that a Xeon server that has been equipped with UPMEM’s PIM DIMM can perform eleven times as many five-word string searches through 128GB of DRAM in a given amount of time as the Xeon processor can perform on its own. The company tells us that this provides significant energy savings: the server consumes only one sixth the energy of a standard system. By using algorithms that have been optimized for parallel processing UPMEM claims to be able to process these searches up to 35 times as quickly as a conventional system.
Furthermore, the same system with an UPMEM PIM is said to sequence a genome ten times as fast as the system that uses standard DRAM, once again using one sixth as much energy.
For those unfamiliar with the approach, PIM builds a small CPU into a standard DRAM chip. Not only does this permit multiple CPUs to simultaneously process data within a system, but it also allows the data to remain within the DRAM chip, rather than to move across the relatively slow memory bus. All of this is great, but there’s one more big benefit: The internal data buses on the chip can be enormous! Data within a DRAM chip typically travels over buses that are thousands of bits wide, so the amount of available bandwidth is huge, and this can be used to vastly accelerate data processing speeds even if the computing power of the internal CPU is small.
A PIM approach also frees up the main processor (the Xeon in this case) from performing much of the data processing. Its role shifts to one of management: It tells the PIMs what to do, then waits until they are done. In theory, a much less powerful CPU could replace the Xeon without impacting the overall system performance.
…but wait! There’s more! Since a system can add PIM one DIMM at a time then the overall system’s processing power can scale linearly in proportion to the number of DIMMs it uses. If a single DIMM gives a certain performance level then two DIMMs will double the performance, and hundreds of DIMMs will provide hundreds of times the performance. This is much easier to work with than conventional computing architectures whose performance tends to grow asymptotically as more hardware is added.
Readers of The SSD Guy blog may remember seeing a similar approach in a 2017 post that explained in situ processing within an SSD, a technique that is growing in popularity and has been renamed “Computational Storage”.
There Is a Downside
After reading the first half of this post you may have decided that The Memory Guy is hopelessly exuberant about this technology. If that’s your opinion then be ready for a change.
The PIM concept has been tossed around for decades, and has been more than a simple academic pursuit, having once been commercialized by Micron Technology. It’s interesting that this approach has suddenly gained renewed attention. It’s also intriguing that it hasn’t already caught on.
UPMEM is one of at least three companies in this field. Micron spun off its PIM effort as an independent company now known as Natural Intelligence Semiconductor, which has been selling PIM chips since 2016. Another PIM developer is Venray, which I blogged about way back in 2011.
PIM is a fundamentally different approach to data management, and, unfortunately, that gets in the way of widespread acceptance. Programmers and engineers get hired based on their understanding of standard products. Most new products are based on exiting designs, a process that, when formalized, is commonly called “Code Re-Use”. But in a lot of other instances code and designs are re-used informally. This is the basis for the overwhelming popularity of the x86 and Arm architectures. A technology that doesn’t have a wide re-use base must struggle for adoption. All three PIM processors lack widespread use.
In order for PIM to reach broad acceptance, a lot of the underpinnings of computing will need to change, in particular the structure of software. Modern software doesn’t already harness the power of this kind of chip, so that must be addressed. This is a chicken-and-egg problem: Software providers can’t waste their effort to support niche hardware, and the hardware won’t be adopted unless it gains a large software base. This was an issue when Intel and AMD stopped increasing clock speeds and shifted to multi-core and multi-threaded designs: Software had to be reconfigured to take advantage of these new architectures. This development effort took several years.
There is something else that stands in the way of hardware adoption. Since this product looks more like a DRAM than a processor, purchasers will expect its cost to be similar to that of a commodity DRAM. Although PIM chips can be sold at DRAM prices, their inherent value will be lost in such a sale, since the PIM’s value lies in its ability to process data more efficiently than a far more costly CPU.
If the chip must be priced close to the price of a DRAM then it encounters another significant hurdle: A big reason why DRAM has such a low cost is because it is produced in extraordinary volumes (roughly 15 billion chips per year). Any chip that is manufactured at a smaller scale must necessarily cost much more to manufacture. This limits a PIM’s ability to be priced close to DRAM.
In brief, while the technological specifications of PIM are very compelling, important business challenges must be overcome if it is to gain widespread acceptance.
My company, Objective Analysis, specializes in helping semiconductor producers and users, along with others involved in chips, to develop strategies for success. This post illustrates the kind of understanding we can use to develop winning strategies. Please contact us to discuss the ways we can help your company to achieve or exceed its business goals.