In conventional machine vision, an image is acquired by a CMOS sensor where each photosensitive pixel converts incoming light into a voltage. The voltages generated by illumination of the chip, are then individually read out by multiplexing row and column and converted into a digital signal. This digital information is then transmitted to a computer where it is analyzed and further processed - a rather cumbersome process, involving many, time-consuming, steps. There are many different algorithms which are able to process visual information, but artificial neural networks (ANNs) have been particularly successful in modern machine vision.
ANNs are inspired by biological brains, where information is processed by neurons which are interlinked by synapses. In an ANN the artificial neurons and synapses are quite simple: a neuron is summing up all the input signals and firing out a signal as soon as the sum of the inputs reach a certain threshold; a synapse is connecting two neurons and is multiplying the output signal by some value, called a weight. Finer details about ANNs, how exactly they work, and how they are trained to be useful, are not so relevant at this point. Here, it is only important to know that these biologically inspired structures are currently emulated on digital computers and that the weights of the ANN determine its behavior.
Our idea was to incorporate the ANN into the image sensor itself and get rid of all the fancy, but rather slow, digital electronics. Again, this idea can be motivated from biology as well: the human eye has over 100 million photoreceptive cells but only around 1 million nerve fibers to the brain, which means that the information is already processed and condensed before it is sent to the brain. To process information like an ANN on a chip, we need a photodetector with a tunable photoresponsivity, or in other words, we need to be able to control the amount of current generated by illumination of each pixel. This is where two-dimensional materials come into play. In WSe2 we can tune the carrier density locally by electrostatic gating and generate a current which depends linearly on the gate voltage. This technique was developed by our group in 2014 (doi.org/10.1038/nnano.2014.14). So, for this project, we only had to build an array of these photodiodes and connected them accordingly. We illuminated the sensor with noisy images of letters and trained it to classify as well as to encode the input successfully.
Our vision sensor represents an ANN which acquires and processes an image at once. This means that its speed is only limited the movement of the electrons between generation in the photodiode and the arrival at the output – which occurs on the order of picoseconds. The speed we demonstrated is lower; limited by the speed of our external electronics. But even with our setup-limited bandwidth of 20 MHz, our system works orders of magnitude faster than any other machine vision system today. Currently, our vision sensor is rather small, and the challenge to scale it up for more general, real world, applications remains. Once achieved however, devices of this kind will outperform current image recognition systems by orders of magnitude and most likely enable completely new applications.
Original journal article: Mennel, L. et al. Ultrafast machine vision with 2D material neural network image sensors. Nature 579, 62–66 (2020). https://doi.org/10.1038/s41586-020-2038-x