Neural Compiler Extensions#
Introduction#
The Neural Compiler Extensions (NCE) module extends the functionality of the Neural Compiler using supplied .nce
files.
In other words models that were previously unsupported due to complex subgraph (transformers) or unsupported operators can be now mapped by using the right extension.
This allows MemryX to enable users without releasing a new SDK. MemryX can supply .nce
files to customers as needed to support new or custom operators and to introduce optimized processing steps between SDK releases.
In the future, customers will be able to write their own extensions!
The --extensions
flag allows users to provide extensions to the NeuralCompiler CLI.
For the API, the equivalent is the extensions
argument, which accepts a list of either str
or Path
objects.
This list should contain paths to .nce
files used to extend the Neural Compiler’s functionality.
All official extensions distributed by MemryX are cryptographically signed.
Extensions are currently available for keras, onnx and tflite frameworks. The feature will be extended to tensorflow in the next release.
Note that some builtin
extensions are available for use on installation of SDK 2.0. These are listed here
Download the Model#
In this tutorial will use the Yolo11n classification model from Ultralytics to demonstrate the use of the NeuralCompiler extensions
argument.
In order to download the Yolo11n-cls model you will have to install the ultralytics
package
pip install ultralytics==8.3.161
Download the torch model and export it to .onnx
format using the following commands.
from ultralytics import YOLO
# Load a model
model = YOLO("yolo11n-cls.pt") # load an official model
# Export the model
model.export(format="onnx")
yolo export model=yolo11n-cls.pt format=onnx # export official model
The yolo11n-cls.onnx
will be downloaded in the current working directory.
Compile#
The yolo11n-cls.onnx
can be compiled using the extensions
argument as shown below.
Note that Yolov10 and Yolo11 share the same pattern for the attention layer. Thus the Yolov10
extension is adequate to handle both yolov10 and yolo11 models.
The Yolov10
extension is available with the release of SDK 2.0 memryx pip package and thus an additional .nce
file is NOT required.
from memryx import NeuralCompiler
nc = NeuralCompiler(models='yolo11n-cls.onnx', verbose=1, dfp_fname='yolo11n-cls', extensions=['Yolov10'])
dfp = nc.run()
mx_nc -v -m yolo11n-cls.onnx --extensions Yolov10
This will generated a dfp yolo11n-cls.dfp
in the current working directory.
Runtime#
After your model is compiled, it’s ready to be accelerated using the MemryX hardware accelerator.
The fps and latency for the model for 1000 frames can be obtained using mx_bench
mx_bench -d yolo11n-cls.dfp -f 1000
The output is as follows:
Ran 1000 frames
Model: 0
Average FPS: 1942.19
Average System Latency: 2.06 ms
Available Extensions#
The table below list the builtin
extensions available for .onnx
models. Since these extensions are included with the SDK 2.0 package, no additional .nce
files are required for the extensions listed below.
Replace <EXTENSIONS>
in the following command to compile the respective model.
mx_nc -v -m --extensions <EXTENSION>
Summary#
This tutorial outlined how to compile a model with a neural compiler extension.