Minimalist machine vision by opto-electronic neural networks

This work highly decreases the size, power, and computation for machine vision with novel joint-optimized opto-electronic neural networks.

Like Comment
Read the paper

In recent years, owing to the advancements in the immense processing ability and parallelism of modern graphics processing units (GPUs), deep learning based on convolutional neural networks (CNN) has developed rapidly, leading to effective solutions for a variety of issues in artificial intelligence applications. However, the massive amounts of data involved in vision processing limit the application of CNNs to those portable, power-efficient, computation-efficient hardware to process data on site.

Several studies have been conducted in the field of optical computing to overcome the challenges of electrical neural networks. Optical computing has many appealing advantages, such as optical parallelism, which can greatly improve computing speed, and optical passivity can reduce energy cost and minimize latency. Optical neural networks (ONNs) provide a way to increase computing speed and overcome the bandwidth bottlenecks of electrical units. However, ONNs require a coherent laser as the light source for computation and can hardly be combined with a mature machine vision system in natural light scenes. So, opto-electronic hybrid neural networks, in which the front end is optical and the back end is electrical, have been proposed. These lens-based systems increase the difficulty of use in edge devices, such as autonomous vehicles. 

In a new paper published in Light Science & Application, a team of researchers, led by Professor Hongwei Chen from Beijing National Research Center for Information Science and Technology (BNRist), Department of Electronic Engineering, Tsinghua University, China, have developed a lensless opto-electronic neural network (LOEN) architecture for computer vision tasks that utilizes a passive mask inserted in the imaging light path to perform convolution operations in the optical field and addressed the challenge of processing incoherent and broadband light signals in natural scenes. In addition, the optical link, image signal processing, and back-end network are smoothly combined to achieve joint optimization for specific tasks to reduce calculation effort and energy consumption throughout the entire pipeline. 

Figure 1. Schematic diagram of the optical mask replacing the convolutional layer of the network.

Compared to the hardware architecture in conventional machine vision, an optical mask closing to the imaging sensor is proposed to replace the lenses in this paper. According to the geometrical optics theory that light propagates in a straight line, the scenes can be regarded as sets of point light sources, and the optical signal is spatially modulated by the mask to realize the convolution operation of shift and superposition on the image sensor. It has been verified that optical masks can replace the convolutional layers of neural networks for feature extraction in the optical domain. 

Figure 2. LOEN prototype for single-kernel and multiple-kernel systems.

For object classification tasks such as handwritten digit recognition, a lightweight network for real-time recognition is built to verify the performance of the optical convolution in the architecture. While using a single convolution kernel, the recognition accuracy can reach 93.47%. When the multi-channel convolution operation is implemented by arranging multiple kernels in parallel on the mask, the classification accuracy can be improved to 97.21%. Compared with traditional machine vision links, it can save about 50% of energy consumption. 

Figure 3. Flow chart of joint optimization of LOEN.

Further, expanding the dimension of the optical mask, the image is convolved in the optical domain, and the sensor captures an aliased image that is unrecognizable to the human eye, which can naturally encrypt private information without computational consumption. The performance of optical encryption was verified on the face recognition task. Compared with the random MLS pattern, the recognition accuracy of the mask jointly optimized by an end-to-end network was improved by more than 6%. At the same time of privacy protection encryption, it basically achieved the same recognition accuracy performance as no-encryption methods.

This work proposes an extremely simplified system for machine vision tasks, which not only realizes the opto-electronic neural network calculation in natural scenes but also opens up the entire optoelectronic link to complete joint optimization to achieve the best results for a specific vision task. Combined with the nonlinear materials, the all- natural-light neural network will be achieved. The novel architecture will have numerous potential applications in many actual scenarios, such as autonomous driving, smart homes, and smart security.

Hongwei Chen

Professor, Tsinghua University