Neural Compiler#

class NeuralCompiler(models=None, num_chips: int = 4, input_shapes=[], effort: str = 'normal', num_processes: int | str = 1, inputs=None, outputs=None, model_in_out=None, autocrop: bool = False, target_fps: float = inf, dfp_fname: str | Path | None = None, no_sim_dfp: bool = False, verbose: int = 0, show_optimization: bool = False, hpoc: list[int] = [], hpoc_file=None, wbtable=None, exp_auto_dp: bool = False, extensions: list[str | Path] = [], allow_unsigned_extensions: bool = False, *args, **kwargs)#

The MemryX Neural Compiler.

The Neural Compiler (NC) compiles Neural Network model(s) into a Dataflow Program (DFP) which can be used to program and perform inference using MemryX Accelerators (MXA).

Parameters:
modelsmodel or list of models

Neural Network model(s) to compile. Can be a path to a model, a loaded NN model, or a list of models. Below is the framework model format summary table:

Framework

str or pathlib.Path

object

onnx

.onnx

model.ModelProto

tflite

.tflite

bytes (loaded flatbuf)

keras / tf_keras

.keras or .h5

keras.Model()

tensorflow

.pb

N/A

# Keras Examples

from keras.applications import MobileNet
model = MobileNet()

# Directly compile a keras.Model object
nc = NeuralCompiler(models=model)

# Compile from a model serialzed to the disk
model.save('model.h5')
nc = NeuralCompiler(models='model.keras')

model.save('model.keras')
nc = NeuralCompiler(models='model.h5')

Models can be co-mapped simultaneously by providing a list of models, allowing for multi-model compilation. This is useful for applications that require multiple models to be run together, such as object detection followed by classification. Frameworks can be mixed, allowing for flexibility in model selection.

# Multi-model Compilation
nc = NeuralCompiler(models=['model.keras', 'model.onnx'])
num_chipsint

Number of MXAs. Provide 0 to automatically calculate the minimum required MXAs. Defaults to 4.

effortstring

Set the compiler’s optimization effort (lazy | normal | hard).

Mode

Description

Lazy

Compile very quickly, but with potentially lower inference performance (FPS).

Normal (default)

Strikes a good balance between compile time and inference performance.

Hard

Will get the best inference performance, but will greatly increase time to compile.

num_processesint or str

Number of processes to use for --effort hard mapping. Can be a number or ‘max’ to select all available threads. Defaults to 1.

target_fpsfloat or int

Sets the target FPS for the cores optimizer, defaults to float('inf') to target maximum performance.

autocropbool

Automatically crop the pre/post-processing layers from the input model(s).

This will (potentially) split the model into three sections if needed:

  • Preproessing - typically layers like resize, normalization, etc.

  • Model Core - the heavy computationaly expensive layers (convolutional, matmuls, etc.)

  • Postprocessing - data processing, like NMS.

The pre/post processing models are to be run on the host using connect_preprocessing functions in the SyncAccl / AsyncAccl APIs. The core model will be compiled and run on the accelerator. If the model is cropped, the pre / post / core parts of the models will be saved as <model_name>_pre.onnx, <model_name>_core.onnx, and <model_name>_post.onnx respectively.

inputsstring

string specifying the names of the input layers of the model(s). For multi-input, delimit inputs with , symbol. For multi-models, separate inputs with | symbol. This argument overrides –model_in_out.

outputsstring

string specifying the names of the output layers of the model(s). For multi-output, delimit outputs with , symbol. For multi-models, separate inputs with | symbol. This argument overrides –model_in_out.

model_in_outstring (path)

JSON file that contains the names of the input and output layers. Use this to extract a subgraph from the full graph. For example, to remove pre/post processing.

dfp_fnamestring or Path

file path location to save dfp. Will not save if unspecified.

no_sim_dfpbool

Skip including Simulator info in the .dfp file. Useful for making smaller files for hardware-only deployments.

verboseint

Controls how verbose the NeuralCompiler is.

show_optimizationbool

Animates mapper optimization steps.

input_shapesdict or list of lists

A dictionary mapping each model index (int) to a list of input shapes, or a single list for shorthand when using one model with one input.

This parameter is required only when input shapes cannot be inferred directly from the model. Use it to explicitly specify the expected input shape(s) for each model. The model index is determined by the order in which models are passed in the models argument.

# One model with one input:
input_shapes = {0: [1, 300, 300, 3]}

# One model with two inputs:
input_shapes = {0: [[1, 300, 300, 3], 
                    [1, 400, 400, 3]]}

# One model with three inputs, only first and third are provided with shapes:
input_shapes = {0: [[1, 300, 300, 3], 
                    [], 
                    [1, 400, 400, 3]]}

# Two models with two and one inputs respectively:
input_shapes = {0: [[1, 300, 300, 3], 
                    [1, 400, 400, 3]], 
                1: [1, 200, 200, 3]}

# Three models, only first and third need shapes:
input_shapes = {0: [[1, 300, 300, 3]], 
                2: [1, 224, 224, 3]}

# Shorthand for one model with two inputs:
input_shapes = [[1, 300, 300, 3], [1, 400, 400, 3]]
hpoclist of ints

Optional list of final layer output channels to increase precision. If HPOC is specified the model must have a single output OR each output must share the same number of OUTPUT CHANNELS. If more flexibility is required (multi-model and/or multi-output) use hpoc_file argument.

hpoc_filepath to json or dict

Optional JSON file used to increase precison for the specified output channels. For more details see formats. Dictionary containing the layer name and channel indices in the format described above may also be passed as a valid input to the API.

wbtablepath to json or dict

Optional JSON file with per-layer weight quantization information. Valid precision values are 4, 8, and 16-bit. You can set global precision (i.e, for ALL layers) with the help of __DEFAULT__ JSON tag. For more details see formats. Dictionary containing the layer names along with the precision values in the format described above may also be passed as a valid input to the API.

exp_auto_dpbool

Experimental double-precision where 16 bit weights are applied to auto-selected layers while other layers use 8 bit weights. This argument is overriden if wbtable is specified.

extensionsstr or Path

List of paths to .nce files which can be used to extend the Neural Compiler functionality at runtime. All official extensions distributed by MemryX will be signed with a key.

allow_unsigned_extensionsbool

Allows running of unsigned extensions. An extension may execute arbitrary code, only run extensions from sources you trust! Defaults to False.

run()#

Run the Neural Compiler.

Runs the Neural Compiler with the current configuration. It will convert the model(s) into a single dataflow program which can be used to program MXAs or simulated using the Simulator.

Note

The models arg must be configured before calling run().

Returns:
dfpDfp object

See DFP for more details.

Raises:
CompilerError, OperatorError, ResourceError

Examples

from tensorflow import keras
mobilenet = keras.applications.MobileNet()
resnet50 = keras.applications.ResNet50()

from memryx import NeuralCompiler

# Compile MobileNet to 1 chip
nc = NeuralCompiler(models=mobilenet, num_chips=1)
dfp = nc.run()

# Compile ResNet50 to 4 chips
dfp = NeuralCompiler(models=resnet50, num_chips=4).run()

# Compile MobileNet+ResNet50 to 4 chips
dfp = NeuralCompiler(models=[mobilenet,resnet50], num_chips=4).run()

# Compile Mobilenet but crop with inputs/outputs argument
inputs = mobilenet.layers[3].name
outputs = mobilenet.layers[5].name
nc = NeuralCompiler(models=mobilenet, inputs=inputs, outputs=outputs).run()
set_config(**kwargs)#

Configure the Neural Compiler.

Configure the Neural Compiler with the keyword arguments listed in the __init__ function.

Parameters:
**kwargs

Keyword args used to configure the Neural Compiler.

Examples

nc = NeuralCompiler()
nc.set_config(num_chips=4, chip_gen="mx3")
nc.set_config(model=mobilenet)
reset_config()#

Reset config.

Reset configuration to the default values that the Neural Compiler was configured with.

Examples

nc = NeuralCompiler()
nc.set_config(effort='hard')
print(nc.get_config()['effort'])

# Config reset
nc.reset_config()
print(nc.get_config()['effort'])

outputs:

>> 'hard'
>> 'normal'
get_config()#

Return the current config.

Get a dictionary of current Neural Compiler configuration.

Returns:
Configdict

Dictionary of Neural Compiler configurations.

Examples

nc = NeuralCompiler()
print(nc.get_config())

outputs:

{'models': [None], 'num_chips': 4, 'input_shapes': [], 'effort':
'normal', 'inputs': None, 'outputs': None, 'model_in_out': None,
'autocrop': False, 'target_fps': float('inf'), 'dfp_fname': None,
'verbose': 0, 'show_optimization': False, 'hpoc': None,
'hpoc_file': None, 'wbtable': None, 'exp_auto_dp': False}