AI models are getting bigger, and with each bump in size, they are drawing pictures, answering questions, and generating text more like humans. Massive computations are needed to pull this off — both to train the deep neural networks, and to run the trained models on a laptop or phone in real time.
If you’ve ever waited impatiently for a song to load or a chatbot to answer a question, you’ve likely run into the von Neumann bottleneck. It’s the lag created in today’s chips, like CPUs and GPUs, that physically process and store data in separate places — like most computers, where processing and memory are also physically distinct. The bigger the AI model, the more time and energy it takes to pass data back and forth between the memory and processors. The computation is so intensive that most AI workloads are handled in data centers in the cloud.
To break the bottleneck, IBM and other technology companies are experimenting with chip architectures that perform mathematical operations directly in-memory. Analog memory devices allow you to store and process data in one place — in the form of continuous conductance values rather than in bits of zeros and ones. Arranged in a crossbar shape, the devices are ideally suited to accelerate the massively parallel multiply-addition operations needed to train and run enormous AI models. We previously showed that analog memory could perform 280 times faster, and use 100 times less energy than the top GPUs at the time.
Implemented at scale, analog chips have the potential to accelerate large-scale language and vision models in the cloud. They could also find their way onto your laptop and phone to run faster, more energy-efficient AI applications.
But until now, accuracy has been analog computing’s Achilles heel. The small errors introduced when converting the mathematical weights of a trained AI model into analog memory get compounded into larger mistakes by the time you click on that chatbot at home.
We think we now have a solution. In a new study in Nature Communications,1 we show how to anticipate these errors when “programming” AI models into analog memory values. It’s a breakthrough that brings analog chips closer to being truly competitive with digital chips. Once the accuracy gap is closed, the speed and energy savings of our technology really start to shine.
The accuracy gap arises because the deep neural network that we want to implement in our analog hardware is impossible to replicate perfectly. There will always be errors from imperfections in the memory devices themselves. What makes this problem even more challenging is that some imperfections also change with time. That means that the AI model you programmed into analog memory can shift from one day to the next.
We can compensate for some of these mistakes when training the AI model through hardware-aware algorithms. In this paper, we show that most of the remaining errors can be corrected post-training.
Under our framework, an algorithm finds the optimal strategies for programming analog memory with the goal of improving the overall accuracy of our AI models.
The algorithm provides a target set of numeric values that anticipate errors when the memory devices are first programmed, and that creep in days later when analog memory values may start to drift. Additional mistakes surface when the model is executed in the real world, during inference. These errors can accumulate throughout large and complex AI models, as the model performs sequences of calculations layer by layer.
Our error-minimizing technique potentially works with a variety of analog memory devices: flash, magnetic RAM (MRAM), and phase change memory (PCM). It also works on a range of AI models, including those that process text, spoken language, and images. We found that AI applications running on analog memory devices programmed with our technique performed as well as applications running on digital chips. We saw the accuracy gap nearly vanish, including weeks later. Our next step will be to test our weight-programming strategies in the newest iteration of our physical analog chip, which is currently under design.
In the meantime, of course, we live in a digital world. Analog chips are getting faster and more energy efficient. But so are digital chips. We’re competing with a moving needle. This latest breakthrough, however, brings us one step closer to ensuring that analog memory will play some future role in meeting AI’s insatiable need for compute.
- Mackin, C., Rasch, M.J., Chen, A. et al. Optimised weight programming for analogue memory-based deep neural networks. Nature Communications 13, 3765 (2022). https://doi.org/10.1038/s41467-022-31405-1