Neural Compiler Extensions#

Introduction#

The Neural Compiler Extensions (NCE) module extends the functionality of the Neural Compiler using supplied .nce files. In other words models that were previously unsupported due to complex subgraph (transformers) or unsupported operators can be now mapped by using the right extension. This allows MemryX to enable users without releasing a new SDK. MemryX can supply .nce files to customers as needed to support new or custom operators and to introduce optimized processing steps between SDK releases. In the future, customers will be able to write their own extensions!

The --extensions flag allows users to provide extensions to the NeuralCompiler CLI. For the API, the equivalent is the extensions argument, which accepts a list of either str or Path objects. This list should contain paths to .nce files used to extend the Neural Compiler’s functionality. All official extensions distributed by MemryX are cryptographically signed.

Extensions are currently available for keras, onnx and tflite frameworks. The feature will be extended to tensorflow in the next release.

Note that some builtin extensions are available for use on installation of SDK 2.0. These are listed here

Download the Model#

In this tutorial will use the Yolo11n classification model from Ultralytics to demonstrate the use of the NeuralCompiler extensions argument. In order to download the Yolo11n-cls model you will have to install the ultralytics package

pip install ultralytics==8.3.161

Download the torch model and export it to .onnx format using the following commands.

python

from ultralytics import YOLO

# Load a model
model = YOLO("yolo11n-cls.pt")  # load an official model

# Export the model
model.export(format="onnx")

CLI

yolo export model=yolo11n-cls.pt format=onnx  # export official model

The yolo11n-cls.onnx will be downloaded in the current working directory.

Compile#

The yolo11n-cls.onnx can be compiled using the extensions argument as shown below. Note that Yolov10 and Yolo11 share the same pattern for the attention layer. Thus the Yolov10 extension is adequate to handle both yolov10 and yolo11 models. The Yolov10 extension is available with the release of SDK 2.0 memryx pip package and thus an additional .nce file is NOT required.

API

from memryx import NeuralCompiler

nc  = NeuralCompiler(models='yolo11n-cls.onnx', verbose=1, dfp_fname='yolo11n-cls', extensions=['Yolov10'])
dfp = nc.run()

CLI

mx_nc -v -m yolo11n-cls.onnx --extensions Yolov10

This will generated a dfp yolo11n-cls.dfp in the current working directory.

Runtime#

After your model is compiled, it’s ready to be accelerated using the MemryX hardware accelerator. The fps and latency for the model for 1000 frames can be obtained using mx_bench

mx_bench -d yolo11n-cls.dfp -f 1000

The output is as follows:

Ran 1000 frames
    Model: 0
    Average FPS: 1942.19
    Average System Latency: 2.06 ms

Available Extensions#

The table below list the builtin extensions available for .onnx models. Since these extensions are included with the SDK 2.0 package, no additional .nce files are required for the extensions listed below.

Replace <EXTENSIONS> in the following command to compile the respective model.

mx_nc -v -m --extensions <EXTENSION>

Model	Extension
Yolo10, Yolo11	`Yolov10`
ConvNext	`ConvNext`
VitSmall	`VitSmall`

Summary#

This tutorial outlined how to compile a model with a neural compiler extension.