Closed Loop Compilation#
Note
This tutorial assumes that an M.2 MX3 is connected to the same machine where you are running the tutorial.
Introduction#
By default, our compiler balances fast compilation and maximum performance. However, for users who seek the best possible performance, we provide the --effort hard
option, which allows the compiler to spend additional time searching for the best-performing solution.
This tutorial explains the concept of “closed-loop effort=hard,” an advanced feature that further refines model performance through real FPS measurements, iterating over different configurations to find the optimal one.
Effort Hard Mode#
The --effort hard
option enables the compiler to generate multiple candidate mapping points and select the most efficient one based on a cost function. This approach differs from the default compilation mode, which relies on heuristics for selecting a single mapping point. While --effort hard
increases compilation time, it often results in better performance.
Introducing: Closed-Loop Compilation#
Closed-loop compilation enhances effort hard
by incorporating real FPS benchmarks measured directly on the chip. By iterating through potential mapping points and selecting the configuration with the highest FPS, closed-loop compilation can improve performance by an average of 3%. Some models may experience even greater gains.
This feature is not yet fully integrated into the compiler but can be executed manually using a provided script.
Step 1: Prepare Your Model#
Before running the closed-loop script
, ensure you have a model ready. For demonstration purposes, we will use a YOLOv5-small-voc model, which can be obtained from a public source using the following command:
wget https://mmdeploy-oss.openmmlab.com/model/mmyolo/yolov5-660fed.onnx
Step 2: Run the Closed-Loop Script#
The closed-loop script
automates the compilation and benchmarking process. It compiles the model at different splits and selects the best-performing executable. Run the script with the model file as an argument:
python3 closed_loop.py --model yolov5-660fed.onnx
The script performs the following steps:
Iterates through multiple split configurations.
Compiles the model at each split.
Benchmarks the compiled dfp file by measuring FPS.
Identifies and selects the split with the highest FPS.
Copies the best-performing dfp file to
best_split.dfp
.
After execution, the script will output FPS results for each tested split. The split with the highest FPS is automatically selected and saved as best_split.dfp
.
Note
The closed-loop process may take a significant amount of time to complete as it exhaustively searches for the best-performing split. Future SDK releases will improve performance with multi-process compilation and tighter closed-loop integration.
Third-Party Licenses#
This tutorial utilizes a third-party model. Below are the license details for this dependency:
Model: YOLOv5-small-voc from OpenMMLab
License: GPL
Summary#
Closed-loop compilation provides a powerful way to optimize neural network execution on the M.2 MX3. By benchmarking and selecting the best split configuration, users can achieve higher FPS with minimal manual effort. Future enhancements, including multi-process compilation and improved SDK integration, will further accelerate the process.
The full script is available for download: