Finally! A Real Virtual Memory Chip

A big red letter X over a box marked MMUAt long last a chip has been embodied to simplify the use of virtual memory, a computing architecture that has been in use since the 1960s. The Memory Guy was given a sneak peek at this new approach by no other than Olly Mugwump, a highly-respected computer architect and longstanding member and fellow of several important technical societies.  The new design promises to accelerate memory accesses while reducing the die size, cost, and power dissipation of processor chips.

At the heart of this new invention is DDR5 DRAM, like any other DDR5 DRAM, but the difference is that it has address input pins for the processor’s entire virtual address bus, which in most cases is 45 bits.  (As a rule, only the lower 48 bits of the 64-bit addressing space are actually used by today’s processors, and the lower 3 bits are reserved for selecting the byte within a word.)

The beauty of this design is that it bypasses the processor’s memory management unit (MMU) which is used to map the virtual addresses to a smaller physical address space.  A rough sketch illustrates this graphically:

Simple block diagram showing a CPU with address and data buses, and an MMU on the address bus.

This approach has been used for over a half century to fool the central processor (CPU) into thinking that the memory is as large as the system’s storage while remaining almost as fast as the memory.

How an MMU Works

Many readers may not understand the intricacies of an MMU, for example, what it does, and why it is in the chip in the first place.  Allow me to explain.  A 64-bit processor has an extraordinarily large address space of 264 bytes, or 1.6 x 1018.  In words, that’s 16 exabytes.  That’s called the virtual memory space.  The physical memory that’s attached to the processor is nowhere near as large as this, and it’s significantly smaller than the system’s storage.  DIMMs commonly sell in densities of only 8-64GB.  That’s about 9 orders of magnitude smaller, or one billionth the size.

To resolve this mismatch, the MMU, under command of the operating system, remaps the processor’s requests into an address space that matches the system’s physical memory.  This is called a virtual-to-physical mapping, and the actual address translation is performed by the MMU.  If the CPU’s desired address hasn’t yet been mapped, then the MMU signals that there’s a “Page Fault” and the processor drops what it’s doing to move things around, removing less-used pages from the physical memory to make room for a new page which it brings in from storage.  This takes an enormous amount of time, so operators add memory to their systems to reduce its occurrence.

Naturally, there’s also a delay incurred by putting all the addresses through the MMU.  It would be better if the processor’s virtual addresses could be directly tied to the physical memory.  This is the problem that the virtual memory chip solves.

Introducing the Virtual Memory Chip

The virtual memory chip removes the need for an MMU by presenting the memory as a circular array of individual pages.  The array internally indexes all addresses to render them relative to a base pointer that can access any memory address, as opposed to the absolute address sent by the processor.  This is conceptually illustrated in the diagram below.

Circle divided into memory pages labeled "Circular Memory Array." Two arrows move clockwise to -FFFF FFFF and counterclockwise to +FFFF FFFF, both at the 3 o'clock position. The 9 o'clock position shows 0000 0000. An offset of 0000 0000 will position the memory’s internal pointer at its base, and offsets of varying amounts move that pointer either clockwise or counterclockwise around a circular memory array.  In the diagram this is represented by plus or minus FFFF FFFF, but the range is actually significantly larger.  (The diagram uses a subset for simplicity’s sake.)  This is all performed with zero added latency through a proprietary architecture.

The circular approach makes up for the fact that a DRAM chip cannot, with today’s technology, have a capacity as large as the processor’s virtual memory space.  While internal locations may end up being mapped to competing processes, this will occur extraordinarily rarely, and the issues it cause will invariably be blamed on software bugs.  “We don’t need to worry about the blame coming down on this chip,” remarks Mugwump with confidence.

Chip densities are specified in virtual space, with physical addressing space added to the specification as a footnote.  “Imagine a memory vendor being able to boast that their new chip has a 16 exabyte virtual address space,” says Mugwump.

It is relatively simple to add the relative addressing logic to the memory chip.  Moore’s Law shrinks the size and cost of logic just as it shrinks the size and cost of memory bits, so it’s reasonable these days to greatly increase the amount of logic on a commodity memory chip.  “In a way, it’s similar to the Chiplet approach, since small portions of the CPU’s silicon are being offloaded from the CPU chip itself,” says Mugwump.  “This frees up valuable real estate on the processor chip for more relevant functions like exponentials of matrices.”

Overcoming Business Issues

Now that the technical design has been solved, Mugwump is looking for ways to tackle the numerous business issues that could hinder this technology’s adoption.  One critical  difficulty is that the leading CPU vendors will need to strip the MMUs out of their processors in order to benefit from the higher speed available through the use of virtual memory chips.  They are unlikely to do this if they perceive any risk to the technology’s adoption.  This is a matter that has yet to be addressed, but the inventor plans to recruit a mesmerist to sell these companies on the idea.

From a system standpoint, the new CXL standard will need to be reconfigured around virtual, rather than physical addressing.  Mugwump concedes that: “Cache coherence in the virtual space is a whole ‘nother different thing.” Advocates of the virtual memory chip are pushing for a new version of CXL to be developed, taking it from CXL 3.0 to a 4th generation, but to underscore the significant architectural changes to the spec they are pushing for it to be called CXL IV rather than CXL 4.0.  “There’s some poetry there, since the name includes all five of the less-significant Roman numerals: I, V, X, L, and C,” says Mugwump.  But detractors point out that the number CXLIV equals 144, a number that they call “Gross.”

Availability

Prototype virtual memory chips are already available, having been released on April first, and can be ordered direct from the manufacturer.  Documentation and EDA models have not yet been released.

Although there is currently no processor or software support for the design, the inventor assures us that academics will soon be developing specialized processor architectures to put this approach to the test.  Watch for studies to soon be released to leading technical journals.