By Oren Debbi, CEO and Co-founder, Visionary.ai
Originally published in Synopsys Technical Bulletin
Anyone at CES in Las Vegas in January this year may have noticed just how many electronic edge devices are now adopting image sensors: drones, doorbells, laptops, toys, security cameras, webcams, vehicles, sports cams, police bodycams, medical devices and many more.
The smartphone has created a market where high-quality image sensors are manufactured at a scale, leading to an expectation of image quality in every device that has a camera.
Consumers now expect a higher sensor resolution, which presents both a challenge and an opportunity for chip makers. The raw signal from a 10-megapixel sensor at 30 frames per second produces a vast amount of data that needs to be treated by an image signal processor (ISP) in real time, at the edge, before the image is suitable for viewing by the human eye.
There is an emerging approach to image signal processing that uses AI and is capable of squeezing more performance from a sensor. This technology has a workload suitable for Synopsys NPUs, vector DSPs, and CPU processors.
Behind the lens, the heart of the image sensor is a semiconductor rectangle divided up into a few million pixels. Each pixel can be as small as 1 micron (1 x 10-6m) across and it will have a tiny color filter, and in the most common ‘Bayer’ filter arrays these are red, green, or blue.
As photons fall on the surface of the semiconductor, a proportion of them will have a quantum interaction with the silicon atoms. Electron-hole pairs are created and the thus small, but measurable, electrical charge is produced, which on average is proportional to the strength of the light falling on the pixel.
The ISP takes the red, green, blue raw data from the sensor and processes it to remove the mosaic effect, adjust the colors, remove distortion from the lens, and make many other corrections. The ISP also effectively compresses the data. The raw sensor data may have a bit-depth in the range 12 to 24 bit, whilst the output is commonly an 8-bit RGB signal.
The ISPs most commonly used today are IP blocks from a small number of vendors. It is a highly parallel form of computing, with algorithms that are effectively burned into hardware, so flexibility after production is limited.
One particular issue in image sensors and ISPs is that of noise, and in many cases this is the limiting factor in system design.
The root cause of the noise is in the image sensor itself, and the problem is most severe at low light, when there are few photons to capture. For example, on a moonlit night, where the light level is less than 0.3 lux, the system is trying to accurately count the photons that fall on a 1 micron squared area of silicon, in the time of 1 video frame which is less than 33 milliseconds. As a point of comparison, the most sensitive cells on the human retina, the rods, are approximately 2 microns in diameter and can detect a single photon.
The nature of quantum effects is that they are highly predictable when there are high volumes of interactions. However, when there are fewer photons, and therefore fewer interactions, they are much less consistent, thus producing noise. To this can be added the thermally generated noise of the silicon itself – whereby electron-hole pairs are randomly produced and can be mistaken for light detected. Noise comes from the process of measuring and digitising the very low levels of electrical charge. Clearly there are a number of ways that noise can creep into the system.
Noise is an important problem to address, as humans strongly dislike noisy pictures which distort the brain’s ability to comprehend an image. Equally, in machine vision systems, noise impedes the performance and makes it more difficult for algorithms to reliably detect objects. So, for both human and machine vision, if noise is present, it will limit the ability of the device to operate in low light. In addition, it reduced the system’s ability to handle pictures with high dynamic range (extremes of light and dark in the same image).
Of course, there are ways of tackling noise in sensor design, mostly based on capturing more photons to increase the signal relative to the noise. The pixel size could be increased, for example, but this either requires a larger, more expensive sensor, or results in a lower resolution image. When the surface area of the silicon increases, the size of the lens also changes and so we end up with a device that is less robust and that is harder to package. Another approach is to increase the exposure time, but this obviously leads to a lower frame rate and increases the risk of motion blur.
If the options for increasing the signal all have downsides, then the removal of noise, or “denoising”, is an alternative.
There are several different signal processing algorithmic approaches that are employed by the ISPs currently on the market, but the performance has limitations. For example, some current denoisers smooth the image, so the sharpness of features in the image are lost.
Israeli start-up Visionary.ai has developed AI that is capable of detecting, and thus removing, more image noise than conventional algorithms. While many computer vision researchers are developing better methods to detect and recognise objects in an image, Visionary.ai’s founders realised that there was an opportunity to improve image quality itself by improving the ISP. A better ISP can deliver higher quality image data, and thereby improve performance of other AI tasks like recognition and segmentation.
Resolving the problem of “Garbage-In-Garbage-Out” has shown to deliver higher precision, improving results for machine vision (applications like license plate recognition or face detection). As for ‘human vision’ applications like smartphone or laptop video quality, Visionary.ai’s real-time denoiser works to produce clearer, brighter imaging, with more accurate coloring.
Unlike other denoisers, the AI denoising method developed by Visionary.ai removes noise in real time, and it has proven to be capable of achieving signal-to-noise ratio enhancements of 19dB. To remove the maximum amount of noise, however, the AI needs access to the raw signal from the image sensor, before it has been modified and compressed by the ISP. Visionary.ai has tackled this challenge by creating a software ISP to entirely replace the traditional hardware ISP (Figure 1).
Even though the ISP and denoiser are software, there is clearly an implication for hardware design to provide the right kinds of compute resources.
Firstly, the denoiser function requires a neural network. The performance requirement varies as the workload scales with the video frame rate and the image resolution. Early development of the denoiser used an Nvidia Jetson, with plenty of compute performance overhead to allow for unconstrained experimentation and exploration. It was always the intention, however, to develop a solution with a silicon area requirement and power budget that would be technically and commercially appropriate for a much wider range of applications. When people think of AI, and even AI at the edge, they often recall the 10’s, 100’s or even 1000’s of TOPS of performance that are deployed for many inference applications. That is certainly not representative of the demands of a denoising application, even at extremely high resolution and frame rates.
The main functions of the ISP are well adapted to a digital signal processor. And this is proven to operate extremely effectively on the Synopsys ARC® EV72 processor. The ARC EV7x family includes heterogenous embedded vision processors that includes scalable vector DSP core and scalable neural network engines. Visionary.ai’s denoiser algorithm will also run on the next-generation combination of Synopsys ARC VPX vector DSPs and ARC NPX neural processing units.
In addition to the ISP algorithms and the denoiser, an application processor is required to run the control code. The workload is not demanding and a single-core 32-bit processor is sufficient, such as the Synopsys ARC HS series (Figure 2).
Apart from the poor noise performance, one of the downsides of conventional ISPs is that they are relatively inflexible. There is a tuning process which matches the ISP to the sensor and this can take weeks or even months. This tuning task represents a significant cost burden and adds a schedule constraint to image systems engineering projects.
A software defined ISP can be tuned much more quickly, through its noise and AI capabilities, and furthermore it can be updated through its lifetime for performance enhancement. Should there be a supply chain issue, for example with a model of image sensor, then it becomes more feasible to re-engineer the system with a new component.
As tuning becomes quicker and cheaper to perform, it becomes viable to make application specific tuning. For example if a particular farming application required fine details of shades of green, or a medical use case demanded better differentiation between shades of red, then optimizations could be made.
Visionary.ai's software ISP with denoiser is released for production and already designed into consumer electronics and security camara designs. Other target markets include automotive, drones and medical industries.
The semiconductor market is still catching up with AI in supplying production-level silicon at these mid-performance points and there is certainly an opportunity right now for entrepreneurial companies to introduce new and competitive silicon to serve the market.