A Software-Programmable Vision System-on-Chip for High-Speed Image Acquisition and Processing

Author: Jens Döge, Christoph Hoppe, Peter Reichel, Nico Peter, Ludger Irsig, Christian Skubich, Patrick Russell

Description of the innovation:
The IAP VSoC2M is a novel member of the Fraunhofer Vision System-on-Chip (VSoC) family for high-speed image acquisition and processing applications. It combines a multitude of innovative approaches such as

- analog convolution of image data during the readout process,
- fast column-parallel image analysis and feature extraction,
- column-specific storage of intermediate processing results in an analog cache with 32 entries each,
- column-parallel software-defined A/D-converters with 1...10 Bit resolution,
- an asynchronous readout path for compression of sparse data and
- an ASIP processor concept (application-specific instruction set) for software-defined control of all processes on the VSoC.

The combination of all these features opens up a variety of new possibilities in embedded image processing that could not be implemented up to now. Possible applications range from a content-based automatic multi region of interest (ROI) image acquisition via optical measuring methods such as sheet of light (SoL) or optical coherence tomography (OCT) to the possibility of process control based on extracted image characteristics.
On the basis of the VSoC, an OEM sensor module for integration into customer cameras and a software-defined smart camera were developed for the use in specific image processing tasks.

**o General Information**

In certain applications, e.g. SoL or OCT, only a few or small ROIs are relevant within the captured images. In order to minimize the effort for the following data processing chain, it is usually advantageous to identify these ROIs as early as possible, i.e. prior digitization, and with minimum effort. Thus, the effort for the analog-to-digital conversion and all subsequent steps can be significantly reduced and the speed for the entire process increases dramatically.

The internal and external digital interfaces represent a bottleneck, especially for image sensors with high frame rates, which can only be mitigated by early data reduction. The principle is to reduce the amount of data transferred and its resolution to the absolute minimum required by the specific application. For control tasks based on optical features, this may mean that even scalars may be sufficient to fulfill the task. The latency from image acquisition to the derivation of these reference variables is the limiting factor. A consequent shift of the complete image processing chain from PC or camera to image sensor or VSoC with latencies far below 100?µs and full frame rates of over 10?kHz opens up a multitude of new application possibilities that can be implemented with the IAP VSoC2M and subsequent Vision-Systems-on-Chip for end users. Another important aspect of reducing or even eliminating external image processing hardware is a significant reduction in the form factor of the overall system and the bill of material.

**o Architecture**

The IAP VSoC2M consists of a 2 megapixel matrix with global shutter pixels and charge-based signal output, flexible line control, an Application Specific Instruction-set Processor (ASIP), a column-parallel processing unit with a compacting readout pipeline and a high-speed LVDS interface. The unique charge based readout principle enables column parallel pre-processing such as weighted binning or 1D convolution in the analog domain. Compared to the previous generation, almost all components have been revised to achieve greater flexibility and higher speed.

**o Line Control**

The pixel matrix is controlled from the ASIP by means of a highly efficient line control. This consists of a line control cache, followed by multiple shift register chains and drivers for the control signals of each pixel row. Via control cache or globally provided data, the shift register chains are preloaded in a first step and can then be shifted in both directions line by line to provide the pixel cells with the required control signals. In addition, the range-addressing feature of the line control cache allows for a very
efficient writing of new data. All shift register chains can be loaded individually and activated for output. This allows, for example, to switch between different convolution cores used for controlling the pixel matrix with very low latency.

o Column-parallel Processor Element (PE)

The column-parallel processing unit consists of 1024 processor elements, which in turn have an analog and digital processing section. On the analog level, a differential readout path with adjustable gain and 32 analog memory elements are available. Five configuration registers are used to configure the analog components and to address the analog memory. Each PE has a dedicated analog-to-digital-to-analog converter, which enables AD conversion with resolutions between 1 Bit and 16 Bit (10 Bit ENOB) on the one hand, and DA conversion of any charge quantities on the other. Digital post-processing in the IAP VSoC2M is limited to a 16-bit counter, simple activity control and connection to the asynchronous compacting readout pipeline. The previous IAP VSoC1M features a full-size PE with 8 Bit ALU, multiple registers and flags.

o Analog Memory

A major innovation compared to the previous VSoC is the integration of analog memory that can be addressed directly from the processor elements. Each column contains 32 memory cells that can store single or analog pre-filtered pixel data. This memory can, for example, be written in a first fast processing with a rate of up to 10 MHz from the sensor matrix in order to provide data for a second precise processing together with the determined ROIs. By concentrating on the sole processing of analog ROI data, very fast implementations, e.g. for SoL, can be realized.

In this example application, the rough position of the laser line is determined in each column and the exact or pre-filtered pixel data is copied into the analog memory. Subsequently, the buffered analog values can be converted to digital values with a resolution suitable for the desired subpixel accuracy and either evaluated on-chip or output via the digital interface.

o Compacting Readout Pipeline

The parallel processing unit contains a high-performance digital readout path based on asynchronous circuit technology. This enables both fast readout of continuous data - e.g. complete image information - and low-latency readout of sparse data such as interesting point coordinates. The readout path offers a maximum data throughput of up to 6 GBit/s and a maximum latency of about 500 ns with an adjustable data word width between 8 bits and 32 bits per column. The data transfer can take place parallel to the actual data processing and therefore does not limit the processing speed. The new readout path makes it possible to use the VSoC for event-based or compressed sensing methods as well as conventional
methods with continuous data streams for the first time.

**ASIP**

The VSoC is managed by an integrated, programmable control unit based on an Application Specific Instruction-set Processor (ASIP), first introduced in 2015 by Reichel et al.. Compared to the multi-ASIP implementation used there, however, only a single control flow, i.e. a single ASIP, is used due to the lower complexity of this VSoC. The general goal is to abstract the behavior of the individual functional units of the VSoC into a form suitable for universal programming. The task of the ASIP is not the conventional processing of digitized image content, but the control of VSoC-specific functional units within the context of specific image acquisition and image processing operations. After programming and parameterization, the control unit enables largely autonomous operation of the VSoC without additional, external control.

An ASIP consists of a processor core which is extended by specific modules for individual functional units and by directly embedding them into its instruction set. This enables a tight integration of the VSoC functionality into the respective, application-dependent control flow. Due to the well-structured and clearly defined interface, a separate, largely freely configurable, stack-based processor core was selected as the basis. All parameters and results of all basic commands and instruction set extensions always refer to the central operand stack. A separate scratch pad memory is available to simplify programming and to handle more complex data. Different address modes allow its flexible use with low overhead. In addition to instructions for arithmetic operations, for controlling the control flow and for manipulating the stack, communication with remote VSoC components via a network on chip (NoC) - presented in 2017 by Russel et al. - and the connection of external sensors and actuators via general-purpose input output (GPIO) are also made available. An exact time base is also provided.

**Software**

Programming can be done directly in assembler or at a higher level using the Python programming language. A special procedure presented 2016 by Reichel et al. transforms individual program sections into VSoC-specific code. Finally, a library with basic functions for image acquisition and processing enables the VSoC to be used in specific applications. The holistic view of VSoC-based preprocessing with conventional, digital image processing methods in a common program is finally called a „Vision Task“.

---

**Technical details and advantages of the innovation:**

Two different approaches are currently being pursued to solve image processing tasks. On the one hand, conventional industrial image sensors with FPGA or PC-based hardware are used to analyze image data, extract features and use them for tasks like measuring objects. The various digital interfaces are usually
the limiting bottleneck to achieve high continuous refresh rates and reliably low latencies. If an operating system is required, as is the case with PCs, there are usually considerable delays and additional uncertainties with regard to the image processing latency.

Since the mid-2000s, special image sensors with integrated column-parallel signal processing have also been used for certain measurement applications, mostly sheet of light (e.g. Sick/IVP Ranger). Despite some disadvantages, such as rolling shutter readout, they have dominated many fields of application for more than 10 years due to their high speed. CNN-based vision sensors (e.g. Anafocus Eye-RIS) with their relatively low image resolution have not yet achieved this wide distribution.

In 2015, Fraunhofer IIS/EAS introduced the concept of column-parallel charge-based signal processing using a novel Vision-System-on-Chip. This made it possible to perform 1D convolution operations before the A/D conversion and to digitize the results with a suitable resolution. Thanks to this concept and the flexible software-programmable control using Application-Specific Instruction-Set Processors, a large number of new applications can be implemented but also conventional measuring methods such as sheet of light and optical coherence tomography are accelerated.

The new VSoC2M offers further considerable gains in speed (at least factor 5 depending on the application) and flexibility with new methods for row- and column-wise compression of pixel data, intermediate and final results of image processing and an even more powerful ASIP.

Relevance and application possibilities of the described innovation for the machine vision industry:

Based on the Vision-Systems-on-Chip of Fraunhofer IIS/EAS and especially the IAP VSoC2M presented here, various applications with high demands on frame rate and latency can be realized very efficiently. Procedures that require considerable compression on the way from the original image data to features or even scalar control parameters can be implemented particularly well. It does not matter whether the redundancy occurs within a frame (many small ROI) or for individual pixels or groups of pixels over a longer period of time.

The software for the VSoCs is implemented in Python with embedded Assembler, whereby existing building blocks for basic functions (skeletons) can be used. VSoC-based image processing can be integrated into customer-specific applications in two ways. Either the available IAP software-defined smart camera systems or the OEM VSoC modules that can be adapted to customers' needs by means of software.

This opens up a variety of new possibilities in embedded image processing that could not be implemented up to now. They range from a freely programmable multi region of interest (ROI) image acquisition via optical measuring methods such as sheet of light (SoL) or optical coherence tomography...
(OCT) to the possibility of low latency process control based on extracted image characteristics.

Images:
38186_iap_vsoc2m_die.jpg
38186_iap_camera_system_cr.jpg
38186_iap_vsoc2m_oem_modul_cr.jpg