A lone inventor has developed a data compression algorithm that defies the theoretical “Shannon Limit“. The press hasn’t covered this recent news, even though it has dramatic implications. This is probably because the technique is so very arcane. The inventor is none other than the great-great-great granddaughter of the inventor of the tabulated punch card, Herman Hollerith.
The algorithm reduces most of the data while converting the remaining information into as many ones as possible. This not only shrinks storage requirements and costs, but in the case of flash memory, it also has an important impact on total power. Flash is erased by setting all bits to ones, and bits are written by either leaving them alone (one) or by changing them (zero). The fewer zeros in the code, the less energy required to change the bits. Energy is also saved during an erase, since fewer bits need to be brought back to the erased state.
To explain the algorithm in its simplest terms, a byte of data is evaluated. If it has more zero bits than one bits the byte is inverted and an index bit is set to reflect this fact. Next, the four bits on either side of the byte are evaluated and if one has more zeros than ones it is inverted and another index bit is set. This process continues until all of the data bits become ones, resulting in seven index bits to represent the original 8-bit byte.
Since the process can only be performed on even numbers of bits, the next step is to index the indexes by approaching them vertically (across eight addresses) one bit at a time. The least significant index bit of addresses 0-7 are compressed, then the next most significant bit, etc.
The algorithm repeats the process diagonally across the array, then diagonally in the other direction. By this time each address has been reduced back to an even number of bits, so the original process can start again. After several iterations the entire data set has been compressed by several thousand times and the remaining data consists of mostly ones. With an infinite number of iterations the data could be reduced to a single bit.
Decompression involves a simple reversal of this process.
The Memory Guy was fortunate enough to be able to speak to the inventor, Ms. April Hollerith, on the phone.
My famous ancestor was highly focused on efficiency. One idea that he had was to minimize the number of holes punched in his cards, because this required human effort. Not only was it tiring to use the original hole-punching machines, but the more holes that had to be punched, the greater the number of human errors that resulted.
He tried boiling the original scheme down to a number of index tables and mnemonic devices, but found that the approach was far too complex for the workers who punched the cards and later abandoned the concept. From time to time over the years university researchers have explored the technique, but it wasn’t until recently that computing power became sufficiently inexpensive that his 1885 invention could be economically implemented. Here we are, 130 years later, and this inventor’s idea is poised to change the world of computing.
When I asked Ms. Hollerith what she intended to call the algorithm she replied that the Hollerith name was already too highly connected to the world of punched cards, and would be a poor choice. This led her to decide to base its title on her first name. Since the algorithm converts as much data as possible into ones she has given it the name “April 1”.