Closing the accuracy gap between analog and digital AI

Analog computing could make AI applications faster and more energy-efficient but translating AI models into analog hardware is messy. We show that by finding an optimal programming strategy, chips with analog memory can do as well on downstream AI tasks as conventional CPUs or GPUs.

Published in Electrical & Electronic Engineering

Jul 05, 2022

Charles Mackin

Research Staff Member, IBM Research

Closing the accuracy gap between analog and digital AI

Liked by Evelina Satkevic and 1 other

Read the paper

AI models are getting bigger, and with each bump in size, they are drawing pictures, answering questions, and generating text more like humans. Massive computations are needed to pull this off — both to train the deep neural networks, and to run the trained models on a laptop or phone in real time.

If you’ve ever waited impatiently for a song to load or a chatbot to answer a question, you’ve likely run into the von Neumann bottleneck. It’s the lag created in today’s chips, like CPUs and GPUs, that physically process and store data in separate places — like most computers, where processing and memory are also physically distinct. The bigger the AI model, the more time and energy it takes to pass data back and forth between the memory and processors. The computation is so intensive that most AI workloads are handled in data centers in the cloud.

To break the bottleneck, IBM and other technology companies are experimenting with chip architectures that perform mathematical operations directly in-memory. Analog memory devices allow you to store and process data in one place — in the form of continuous conductance values rather than in bits of zeros and ones. Arranged in a crossbar shape, the devices are ideally suited to accelerate the massively parallel multiply-addition operations needed to train and run enormous AI models. We previously showed that analog memory could perform 280 times faster, and use 100 times less energy than the top GPUs at the time.

Implemented at scale, analog chips have the potential to accelerate large-scale language and vision models in the cloud. They could also find their way onto your laptop and phone to run faster, more energy-efficient AI applications.

But until now, accuracy has been analog computing’s Achilles heel. The small errors introduced when converting the mathematical weights of a trained AI model into analog memory get compounded into larger mistakes by the time you click on that chatbot at home.

We think we now have a solution. In a new study in Nature Communications,¹ we show how to anticipate these errors when “programming” AI models into analog memory values. It’s a breakthrough that brings analog chips closer to being truly competitive with digital chips. Once the accuracy gap is closed, the speed and energy savings of our technology really start to shine.

The accuracy gap arises because the deep neural network that we want to implement in our analog hardware is impossible to replicate perfectly. There will always be errors from imperfections in the memory devices themselves. What makes this problem even more challenging is that some imperfections also change with time. That means that the AI model you programmed into analog memory can shift from one day to the next.

We can compensate for some of these mistakes when training the AI model through hardware-aware algorithms. In this paper, we show that most of the remaining errors can be corrected post-training.

Under our framework, an algorithm finds the optimal strategies for programming analog memory with the goal of improving the overall accuracy of our AI models.

The algorithm provides a target set of numeric values that anticipate errors when the memory devices are first programmed, and that creep in days later when analog memory values may start to drift. Additional mistakes surface when the model is executed in the real world, during inference. These errors can accumulate throughout large and complex AI models, as the model performs sequences of calculations layer by layer.

Our error-minimizing technique potentially works with a variety of analog memory devices: flash, magnetic RAM (MRAM), and phase change memory (PCM). It also works on a range of AI models, including those that process text, spoken language, and images. We found that AI applications running on analog memory devices programmed with our technique performed as well as applications running on digital chips. We saw the accuracy gap nearly vanish, including weeks later. Our next step will be to test our weight-programming strategies in the newest iteration of our physical analog chip, which is currently under design.

In the meantime, of course, we live in a digital world. Analog chips are getting faster and more energy efficient. But so are digital chips. We’re competing with a moving needle. This latest breakthrough, however, brings us one step closer to ensuring that analog memory will play some future role in meeting AI’s insatiable need for compute.

Mackin, C., Rasch, M.J., Chen, A. et al. Optimised weight programming for analogue memory-based deep neural networks. Nature Communications 13, 3765 (2022). https://doi.org/10.1038/s41467-022-31405-1

Charles Mackin

Research Staff Member, IBM Research

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Subscribe to the Topic

Electrical and Electronic Engineering

Technology and Engineering > Electrical and Electronic Engineering

Nature Communications

Nature Communications

An open access, multidisciplinary journal dedicated to publishing high-quality research in all areas of the biological, health, physical, chemical and Earth sciences.

More about the journal

Related Collections

With collections, you can get published faster and increase your visibility.

Applied Sciences

This collection highlights research and commentary in applied science. The range of topics is large, spanning all scientific disciplines, with the unifying factor being the goal to turn scientific knowledge into positive benefits for society.

Publishing Model: Open Access

Deadline: Ongoing

Explore this collection

Latest Content

Behind the Paper

Research Communities by Springer Nature

Closing the accuracy gap between analog and digital AI