Illusion of large on-chip memory by networked computing chips for neural network inference

Like Comment

Complex artificial intelligence (AI) tasks such as neural nets are a key driver for almost all computing systems from the edge to the cloud. Have you ever wondered what a Dream AI Chip would look like? Neural nets demand enormous memory and computing resources. A Dream AI Chip would co-locate all this memory on a single chip together with computing elements, quickly accessible at low energy. Such Dream Chips aren’t realizable today. Computing systems use large off-chip memory and spend enormous time and energy shuttling data back-and-forth. This memory wall gets worse as neural nets continue to grow, especially as conventional transistor miniaturization gets increasingly difficult.

Giant wafer-scale chips represent industry’s attempts to address this memory wall. At Stanford, our N3XT concept enables substantial benefits through new 3D NanoSystems for computation immersed in memory. Industrial N3XT implementations using monolithic 3D integration of carbon nanotube transistors and Resistive RAM (RRAM) are functional at the SkyWater Technology Foundry. Despite these innovations, Dream Chips continue to be a moving target for neural nets with insatiable memory demands.

How do we realize Dream Chips that eliminate the memory wall? To answer that, we posed the following question: Can we orchestrate a system of multiple computing chips, each with its local on-chip memory, to create an Illusion of a Dream Chip with near-Dream energy and execution time?

We show that, by ensuring enough local (on-chip) memory and quick chip wakeup / shutdown (e.g., through dense and non-volatile memory technologies, 3D NanoSystems, special power-gating circuits), we can enable Illusion for neural net inference workloads. We engineer special partitioning, mapping, and scheduling algorithms that minimize inter-chip traffic and idle energy to achieve Illusion. We also derive design guidelines for Illusion.

Our algorithms follow the mantra: move computation, not data. Computations are performed on chips where their data reside, avoiding massive inter-chip traffic. Idle energy is eliminated by quickly turning ON/OFF individual chips according to our Illusion schedule. Illusion is thus distinct from traditional multi-chip (parallel) computing, from 20th century approaches (such as Illiac IV, Connection Machine, VLSI Systolic Processing Arrays) to recent multi-chip accelerator systems.

Our eight-chip Illusion System hardware incurs only an additional 3.5% energy and 2.5% execution time over a Dream Chip. Each computing chip contains silicon transistors and on-chip RRAM. Our simulations show that 64-chip Illusion is possible.

By combining multiple chips in an Illusion system (left) we can create the illusion of a single, ideal Dream Chip (right).

Illusion has profound implications for future technologies. To achieve Dream-like performance, the world has been primarily pursuing the on-chip integration path: through miniaturization until now, and through dense 3D integration moving forward. However, the Dream Chip continues to be a moving target. The other viable path is to create multi-chip systems: the chiplet integration path. This path is limited by the inter-chip connections. A new approach orchestrated for Illusion that combines these two paths is the key to future progress. It opens up a large design space for technology innovations and creates a new scaling path for future systems that deliver near-Dream performance (energy and execution time) through Illusion.

For more information, please see our recent publication in Nature Electronics: Illusion of large on-chip memory by networked computing chips for neural network inference

Robert M. Radway

Ph.D. Candidate, Stanford University Electrical Engineering