A high performance “non-local” generic face reconstruction model employing the portable Speckle-Transformer (SpT) Unet is covered in a recent publication from Opto-Electronic Advances.
Optical artificial intelligence for computational imaging (CI) is designed and developed by utilising the feature extraction and generalisation capabilities of already-existing advanced computer neural networks, along with the speed of light, low energy consumption, and parallel multi-dimensional optical signal processing capabilities of optical artificial intelligence algorithms.
Electrical convolutional neural networks (CNNs) have shown that picture reconstruction is possible in CI, with applications ranging from autonomous car navigation in foggy situations to non-invasive medical imaging through the tissue. However, the performance of CNNs is unreliable for spatially dense patterns, such as the generic face images, because of the convolutional operator’s constrained “local” kernel size. Therefore, there is an urgent need for a “non-local” kernel that can extract the feature maps’ long-term dependencies. The transformers are parallelizable modules that solely rely on the attention mechanism.
In addition, the transformer makes less assumptions about the problem’s structure than its convolutional and recurrent deep learning cousins. For image identification, object detection, segmentation, picture super-resolution, video interpretation, image production, text-image synthesis, and other tasks in vision, transformers have been utilised successfully. According to what is known right now, however, none of the experiments have looked into how the transformers work in CI, such with speckle reconstruction.
The Speckle-Transformer (SpT) UNet, a “non-local” model, is employed in this study for extremely accurate, energy-efficient parallel processing of the speckle reconstructions. The network has a sophisticated transformer encoder and decoder block architecture.
The authors suggest and show three essential techniques, namely pre-batch normalization (pre-BN), position encoding in multi-head attention/multi-head cross-attention (MHA/MHCA), and self-built up/down sampling pipelines, for enhanced feature reservation/extraction. Four various grits of diffusers within the 40 mm detection range are taken into consideration for the “scalable” data gathering. In comparison to other cutting-edge “non-local” networks used in vision computation, such as ViT and SWIN Transformer, the SpT UNet is a lightweight network with less than one order of parameters in size.
Four scientific measures are used by the authors to objectively assess the network performance: peak signal-to-noise ratio, Jaccard index, structural similarity measure, and Pearson correlation coefficient (PSNR). With Pearson Correlation Coefficient (PCC) and structural similarity measure (SSIM) values above 0.989 and 0.950, respectively, the lightweight SpT UNet exhibits a high efficiency and strong comparative performance. The lightweight SpT UNet can be further developed as an all-optical neural network for optical artificial intelligence with exceeding feature extraction, light speed, and passive processing capabilities.