File Formats#
Dataflow Program#
A Dataflow Program (DFP) is an encrypted binary file produced by the Neural Compiler and is used to configure one or many MXAs. It is analogous to an FPGA bitstream; statically configures the MXA(s) so it can perform NN inference on an input datastream.
One (or many) Neural Network models are compiled to an MXA target (generation / number of chips) using the Neural Compiler. This will produce a single DFP which can be uploaded to the MXA(s). After configuration, the MXA(s) will run inference by simply streaming data to the accelerator and reading the inference outputs.
- DFP take-aways:
A binary file produced by the Neural Compiler.
Used to program an MXA.
Can contain one or many Neural Network models.
Targets a specific a MXA configuration (generation and number of MXA chips).
Encrypted; model architecture and weights are obfuscated using advanced encryption libraries.
Model Cropping#
The MemryX MXA excels at accelerating NN operators like convolution2d and dense at the expense of supporting general tensor graphs and conditional operations which are sometimes present in pre/post processing of a Neural Network model.
To help increase compatibility with MemryX chips, we offer a built-in functionality to the Neural Compiler which can crop ‘unsupported’ operators from a NN graph.
mx_nc
provides arguments to specify the ‘input/output’ nodes of the NN
that should be accelerated. The nodes outside this range will be ‘cropped’
from the model and should be run by the host processor.
mx_nc -h
...
Graph Processing:
--model_in_out JSON file that contains the names of the input and output layers.
...
The model_in_out file is a JSON specifying the model inputs and outputs. This file format supports single input/output, multi input/output, and multi-model (w/ multi-input/output).Some examples are listed below:
Single Model Cropping#
For a single model, specify input/inputs and output/outputs as a string/list of strings.
{ "input": "conv1", "output": "add_1" }
{
"inputs": ["res2a_branch2a", "res2a_branch1"]
"outputs": ["activation4", "res2b_branch2a"],
}
With one of the above json files saved as boundary.json, mx_nc
can be run
as follows:
mx_nc -v -c 4 -m resnet50.h5 --model_in_out boundary.json
Multi-Model Cropping#
For multi-model compilation, specify the model number as a dictionary with input/inputs and output/outputs. The models are numbered by the order in which they are specified to the Neural Compiler. If model cropping is not needed for a given model, that model may be omitted.
{
"0": {"input": "conv1", "outputs": ["res2a_branch2c", "res2a_branch1"]},
"1": {"input": "conv_dw_1", "output":"conv_pw_1"}
}
{
"1": {"input": "conv_dw_1", "output":"conv_pw_1"}
}
With one of the above json files saved as boundary.json, we can run
mx_nc
as follows:
mx_nc -v -c 4 -m resnet50.h5 mobilenet.h5 --model_in_out boundary.json
Weight Precision#
Weights in all the layers in MXA(s) are by default quantized to 8 bit. The user has the option of setting a desired precision with the help of the Neural Compiler. provides an argument --weight_bits_table
to specify the path to the JSON file.
The weight_bits_table file is a JSON specifying the model layer names and the corresponding desired precision for the layer. This file format supports single model and multi-models. Some examples are listed below:
Single Model Weight Precision#
The wbits.json
is a JSON file as depicted in the example below.
.json file format:
Desired Precision |
JSON Value |
4-bit |
4 |
8-bit |
8 (default) |
16-bit |
16 |
Example 1: Different levels of precision for each layer.
{
"important_conv": 16,
"dense_layer": 4
}
Say you have 4 layers in your model:
Passing --weight_bits_table wbits.json
to the Neural Compiler will now quantize your layer weights as follows:
Layer |
Weight Precision |
conv2d_0 |
8-bit |
conv2d_1 |
8-bit |
important_conv |
16-bit |
dense_layer |
4-bit |
Example 2. 16 bit precision for all layers in the network. On an average this would take 2x the number of chips than the default 8 bit precision
{
"__DEFAULT__" : 16
}
Say you have 3 layers in your model:
Passing --weight_bits_table wbits_2.json
to the Neural Compiler will now quantize your layer weights like this:
Layer |
Weight Precision |
conv2d_0 |
16-bit |
depthwise_0 |
16-bit |
dense_layer |
16-bit |
Multi-Model Weight Precision#
For multi-model compilation, specify the model number as a dictionary key with layer name and corresponding precision . The models are numbered by the order in which they are specified to the Neural Compiler. If a specific precision is not needed for a model, then that model may be omitted.
Example 1. In this example two models are passes as input to the Neural Compiler, Model 0 will get 16 bit precision in all layers except conv2d_1 (assigned 8 bit precision) and predictions (assigned 4 bit precision). Model 1 will get 4 bit precision in all layers except conv1 (assigned 16 bit precision)
{
"0": {"__DEFAULT__": 16, "conv2d_1": 8, "predictions": 4},
"1": {"__DEFAULT__": 4, "conv1": 16}
}
Example 2. Two models, Model 0 will get 8 bit precision in all layers. Model 1 will get 16 bit precision in all layers.
{
"1": {"__DEFAULT__": 16}
}
With one of the above json files saved as wbits.json, we can run
mx_nc
as follows:
mx_nc -v -c 4 -m resnet50.h5 mobilenet.h5 -bt wbits.json
High-Precision Output Channels#
The Neural Compiler has two flags/arguments to enable HPOC: -hpoc
and -hpoc_file
.
The -hpoc
flag should be used in simple single-model and single-output scenarios to specify the indices of the output channels that need increased precision. (string of space separated ints).
The -hpoc_file
flag should be used if more flexibility is needed such as in multi-model and/or multi-output scenarios. This expects a JSON config file with the following format:
{
"model_index": {
"output_name": [channel_idx, ...],
...
},
...
}
The
model_index
allows for multi-model support and must match the order in which the models are specified to the Neural Compiler. (string)The
output_name
is the name of the output layer in the model which can be found using a visualization tool like Netron. You may specify multiple key-value pairs here for multi-output scenarios. (string)The
channel_index
is the index of the output channels that needs increased precision. These indices should be determined from application-specific knowledge. (list of ints)
Example 1: Mulit-Model / Multi-Output Scenario
Consider a scenario where there are two models specified to the Neural Compiler:
The first model specified (index 0) has three outputs (
x0, x1, x2
). We will use high precision for channels 0 and 1 ofx0
and channel 5, 15, and 21 ofx2
.The second model specified (index 1) has a single output (
conv2d_0
). We will use high precision for channels 0 and 6.
Save the following as hpoc.json
{
"0": {"x0": [0,1], "x2": [5,15,21]},
"1": {"conv2d_0": [0,6]}
}
Then compile the two models with the following command:
mx_nc -v -m three_output_model.h5 single_output_model.h5 -hpoc_file hpoc.json
Extrapolate this example to your specific use case.
Example 2: YOLOv7