Pre & Post Processing#

“Pre-processing” and “post-processing” refer to the steps that occur before and after the core neural network inference in a model. These steps can include data normalization, resizing, or applying non-maximum suppression (NMS) to filter out overlapping detections, among other operations. Sometimes these operations are done in the application code (such as with OpenCV), while other times they may be embedded within the network model itself.

Additionally, some models have layers that are not supported on the MXA hardware, such as list sorting and other data manipulation operations that are CPU-like in nature.

Here we will discuss both cropped pre/post models and more general pre/post processing steps common in applications.

Cropped Pre/Post Models#

Many neural network models have pre- or post-processing layers saved within their graphs, which aren’t typically supported directly on the MXA hardware due to their CPU-like nature. For example, some YOLO models have a “NMS” (Non-Maximum Suppression) layer in their ONNX graph that needs to run on the CPU.

This is where Model Cropping comes in. Either with --autocrop or manual cropping, the NeuralCompiler can split the input model into multiple parts: the core neural network which will run on the MXA, and the pre- and/or post-processing layers which will run on the CPU.

See also

See the Cropping Tutorial for more details.

In your application, simply use the Runtime APIs to transparently handle the pre- and post-processing steps.

How To Connect Pre/Post#

Connecting pre/post models is simple:

# Create an accl object
accl = AsyncAccl("my_model.dfp")

# Connect cropped pre & post models (ONNX in this case)
accl.set_preprocessing_model("my_model_pre.onnx")
accl.set_postprocessing_model("my_model_post.onnx")

# (now continue with connect_input/output, etc.)
// Create an accl object
MxAccl accl("my_model.dfp");

// Connect cropped pre & post models (ONNX in this case)
accl.connect_pre_model("my_model_pre.onnx");
accl.connect_post_model("my_model_post.onnx");

// (now continue with connect_stream, etc.)

And that’s it! The pre and/or post layers will automatically execute in onnxruntime, tensorflow, or tflite (depending on the model type).

Auto Pre/Post FMap Shapes#

When using pre/post models, the input and output shapes of the pre/post models must match the input and output shapes of the original model.

For example, if your original model has an input shape of (3, 640, 640) and an output shape of (1000), then the pre-processing model must also have an input shape of (3, 640, 640) and the post-processing model must have an output shape of (1000).

In other words, use_model_shape is always set to True for inputs when using a pre-processing model, and for outputs when using a post-processing model (and both if using both). See below.

General Pre/Post#

Pre- and post-processing more generally can include operations like resizing and NMS. These operations depend on the type of application and neural network model you’re using, and if you already are using a CPU/GPU solution, they can be reused with MemryX runtime.

Feature Map Shapes#

The MXA hardware always uses channel-last format (e.g. (640, 640, 3)). Tensorflow and TFLite models also use this format; however, many models made in PyTorch and ONNX use channel-first format (e.g. (3, 640, 640)).

The use_model_shape argument in both the Python and C++ runtimes allows you to specify whether you will be providing feature maps in the original model’s shape (set to True, which is default) or in the MXA’s always-channel-last shape (set to False).

With this arg, you can control the channel order for connecting input & output

Important

Usually, you will want to use the default behavior of use_model_shape=True.

Using the MXA shape is an option only for advanced users who are trying to squeeze every bit of performance. For example, by providing an OpenCV channel-last input to an ONNX model – avoiding the double conversion from channel-first to channel-last and back again.