File Formats#

Dataflow Program#

A Dataflow Program (DFP) is an encrypted binary file produced by the Neural Compiler and is used to configure one or many MXAs. It is analogous to an FPGA bitstream; statically configures the MXA(s) so it can perform NN inference on an input datastream.

One (or many) Neural Network models are compiled to an MXA target (generation / number of chips) using the Neural Compiler. This will produce a single DFP which can be uploaded to the MXA(s). After configuration, the MXA(s) will run inference by simply streaming data to the accelerator and reading the inference outputs.

DFP take-aways:
  • A binary file produced by the Neural Compiler.

  • Used to program an MXA.

  • Can contain one or many Neural Network models.

  • Targets a specific a MXA configuration (generation and number of MXA chips).

  • Encrypted; model architecture and weights are obfuscated using advanced encryption libraries.

Model Cropping#

The MemryX MXA excels at accelerating NN operators like convolution2d and dense at the expense of supporting general tensor graphs and conditional operations which are sometimes present in pre/post processing of a Neural Network model.

To help increase compatibility with MemryX chips, we offer a built-in functionality to the Neural Compiler which can crop ‘unsupported’ operators from a NN graph.

mx_nc provides arguments to specify the ‘input/output’ nodes of the NN that should be accelerated. The nodes outside this range will be ‘cropped’ from the model and should be run by the host processor.

mx_nc -h
  ...
  Graph Processing:
    --model_in_out      JSON file that contains the names of the input and output layers.
  ...

The model_in_out file is a JSON specifying the model inputs and outputs. This file format supports single input/output, multi input/output, and multi-model (w/ multi-input/output).Some examples are listed below:

Single Model Cropping#

For a single model, specify input/inputs and output/outputs as a string/list of strings.

{ "input": "conv1", "output": "add_1" }
{
    "inputs": ["res2a_branch2a", "res2a_branch1"]
    "outputs": ["activation4", "res2b_branch2a"],
}

With one of the above json files saved as boundary.json, mx_nc can be run as follows:

mx_nc -v -c 4 -m resnet50.h5 --model_in_out boundary.json

Multi-Model Cropping#

For multi-model compilation, specify the model number as a dictionary with input/inputs and output/outputs. The models are numbered by the order in which they are specified to the Neural Compiler. If model cropping is not needed for a given model, that model may be omitted.

{
        "0": {"input": "conv1", "outputs": ["res2a_branch2c", "res2a_branch1"]},
        "1": {"input": "conv_dw_1", "output":"conv_pw_1"}
}
{
        "1": {"input": "conv_dw_1", "output":"conv_pw_1"}
}

With one of the above json files saved as boundary.json, we can run mx_nc as follows:

mx_nc -v -c 4 -m resnet50.h5 mobilenet.h5 --model_in_out boundary.json

Weight Precision#

Weights in all the layers in MXA(s) are by default quantized to 8 bit. The user has the option of setting a desired precision with the help of the Neural Compiler. provides an argument --weight_bits_table to specify the path to the JSON file.

The weight_bits_table file is a JSON specifying the model layer names and the corresponding desired precision for the layer. This file format supports single model and multi-models. Some examples are listed below:

Single Model Weight Precision#

The wbits.json is a JSON file as depicted in the example below.

.json file format:

Desired Precision

JSON Value

4-bit

4

8-bit

8 (default)

16-bit

16

Example 1: Different levels of precision for each layer.

{
    "important_conv": 16,
    "dense_layer": 4
}

Say you have 4 layers in your model:

conv2d_0
conv2d_1
important_conv
dense_layer

Passing --weight_bits_table wbits.json to the Neural Compiler will now quantize your layer weights as follows:

Layer

Weight Precision

conv2d_0

8-bit

conv2d_1

8-bit

important_conv

16-bit

dense_layer

4-bit

Example 2. 16 bit precision for all layers in the network. On an average this would take 2x the number of chips than the default 8 bit precision

{
    "__DEFAULT__" : 16
}

Say you have 3 layers in your model:

conv2d_0
depthwise_0
dense_layer

Passing --weight_bits_table wbits_2.json to the Neural Compiler will now quantize your layer weights like this:

Layer

Weight Precision

conv2d_0

16-bit

depthwise_0

16-bit

dense_layer

16-bit

Multi-Model Weight Precision#

For multi-model compilation, specify the model number as a dictionary key with layer name and corresponding precision . The models are numbered by the order in which they are specified to the Neural Compiler. If a specific precision is not needed for a model, then that model may be omitted.

Example 1. In this example two models are passes as input to the Neural Compiler, Model 0 will get 16 bit precision in all layers except conv2d_1 (assigned 8 bit precision) and predictions (assigned 4 bit precision). Model 1 will get 4 bit precision in all layers except conv1 (assigned 16 bit precision)

{
                "0": {"__DEFAULT__": 16, "conv2d_1": 8, "predictions": 4},
                "1": {"__DEFAULT__": 4, "conv1": 16}
}

Example 2. Two models, Model 0 will get 8 bit precision in all layers. Model 1 will get 16 bit precision in all layers.

{
                "1": {"__DEFAULT__": 16}
}

With one of the above json files saved as wbits.json, we can run mx_nc as follows:

mx_nc -v -c 4 -m resnet50.h5 mobilenet.h5 -bt wbits.json

High-Precision Output Channels#

The Neural Compiler has two flags/arguments to enable HPOC: -hpoc and -hpoc_file.

The -hpoc flag should be used in simple single-model and single-output scenarios to specify the indices of the output channels that need increased precision. (string of space separated ints).

The -hpoc_file flag should be used if more flexibility is needed such as in multi-model and/or multi-output scenarios. This expects a JSON config file with the following format:

{
        "model_index": {
                "output_name": [channel_idx, ...],
                ...
        },
        ...
}
  • The model_index allows for multi-model support and must match the order in which the models are specified to the Neural Compiler. (string)

  • The output_name is the name of the output layer in the model which can be found using a visualization tool like Netron. You may specify multiple key-value pairs here for multi-output scenarios. (string)

  • The channel_index is the index of the output channels that needs increased precision. These indices should be determined from application-specific knowledge. (list of ints)

Example 1: Mulit-Model / Multi-Output Scenario

Consider a scenario where there are two models specified to the Neural Compiler:

  • The first model specified (index 0) has three outputs (x0, x1, x2). We will use high precision for channels 0 and 1 of x0 and channel 5, 15, and 21 of x2.

  • The second model specified (index 1) has a single output (conv2d_0). We will use high precision for channels 0 and 6.

Save the following as hpoc.json

{
"0": {"x0": [0,1], "x2": [5,15,21]},
"1": {"conv2d_0": [0,6]}
}

Then compile the two models with the following command:

mx_nc -v -m three_output_model.h5 single_output_model.h5 -hpoc_file hpoc.json

Extrapolate this example to your specific use case.

Example 2: YOLOv7