Neural Compiler#
- class NeuralCompiler(models=None, num_chips: int = 4, input_shapes=[], effort: str = 'normal', num_processes: int | str = 1, inputs=None, outputs=None, model_in_out=None, autocrop: bool = False, target_fps: float = inf, dfp_fname: str | Path | None = None, no_sim_dfp: bool = False, verbose: int = 0, show_optimization: bool = False, hpoc: list[int] = [], hpoc_file=None, wbtable=None, exp_auto_dp: bool = False, extensions: list[str | Path] = [], allow_unsigned_extensions: bool = False, *args, **kwargs)#
The MemryX Neural Compiler.
The Neural Compiler (NC) compiles Neural Network model(s) into a Dataflow Program (DFP) which can be used to program and perform inference using MemryX Accelerators (MXA).
- Parameters:
- modelsmodel or list of models
Neural Network model(s) to compile. Can be a path to a model, a loaded NN model, or a list of models. Below is the framework model format summary table:
Framework
str
orpathlib.Path
object
onnx
.onnx
model.ModelProto
tflite
.tflite
bytes
(loaded flatbuf)keras / tf_keras
.keras
or.h5
keras.Model()
tensorflow
.pb
N/A
# Keras Examples from keras.applications import MobileNet model = MobileNet() # Directly compile a keras.Model object nc = NeuralCompiler(models=model) # Compile from a model serialzed to the disk model.save('model.h5') nc = NeuralCompiler(models='model.keras') model.save('model.keras') nc = NeuralCompiler(models='model.h5')
Models can be co-mapped simultaneously by providing a list of models, allowing for multi-model compilation. This is useful for applications that require multiple models to be run together, such as object detection followed by classification. Frameworks can be mixed, allowing for flexibility in model selection.
# Multi-model Compilation nc = NeuralCompiler(models=['model.keras', 'model.onnx'])
- num_chipsint
Number of MXAs. Provide 0 to automatically calculate the minimum required MXAs. Defaults to 4.
- effortstring
Set the compiler’s optimization effort (lazy | normal | hard).
Mode
Description
Lazy
Compile very quickly, but with potentially lower inference performance (FPS).
Normal (default)
Strikes a good balance between compile time and inference performance.
Hard
Will get the best inference performance, but will greatly increase time to compile.
- num_processesint or str
Number of processes to use for
--effort hard
mapping. Can be a number or ‘max’ to select all available threads. Defaults to 1.- target_fpsfloat or int
Sets the target FPS for the cores optimizer, defaults to
float('inf')
to target maximum performance.- autocropbool
Automatically crop the pre/post-processing layers from the input model(s).
This will (potentially) split the model into three sections if needed:
Preproessing - typically layers like resize, normalization, etc.
Model Core - the heavy computationaly expensive layers (convolutional, matmuls, etc.)
Postprocessing - data processing, like NMS.
The pre/post processing models are to be run on the host using
connect_preprocessing
functions in the SyncAccl / AsyncAccl APIs. The core model will be compiled and run on the accelerator. If the model is cropped, the pre / post / core parts of the models will be saved as<model_name>_pre.onnx
,<model_name>_core.onnx
, and<model_name>_post.onnx
respectively.- inputsstring
string specifying the names of the input layers of the model(s). For multi-input, delimit inputs with , symbol. For multi-models, separate inputs with | symbol. This argument overrides –model_in_out.
- outputsstring
string specifying the names of the output layers of the model(s). For multi-output, delimit outputs with , symbol. For multi-models, separate inputs with | symbol. This argument overrides –model_in_out.
- model_in_outstring (path)
JSON file that contains the names of the input and output layers. Use this to extract a subgraph from the full graph. For example, to remove pre/post processing.
- dfp_fnamestring or Path
file path location to save dfp. Will not save if unspecified.
- no_sim_dfpbool
Skip including Simulator info in the .dfp file. Useful for making smaller files for hardware-only deployments.
- verboseint
Controls how verbose the NeuralCompiler is.
- show_optimizationbool
Animates mapper optimization steps.
- input_shapesdict or list of lists
A dictionary mapping each model index (int) to a list of input shapes, or a single list for shorthand when using one model with one input.
This parameter is required only when input shapes cannot be inferred directly from the model. Use it to explicitly specify the expected input shape(s) for each model. The model index is determined by the order in which models are passed in the models argument.
# One model with one input: input_shapes = {0: [1, 300, 300, 3]} # One model with two inputs: input_shapes = {0: [[1, 300, 300, 3], [1, 400, 400, 3]]} # One model with three inputs, only first and third are provided with shapes: input_shapes = {0: [[1, 300, 300, 3], [], [1, 400, 400, 3]]} # Two models with two and one inputs respectively: input_shapes = {0: [[1, 300, 300, 3], [1, 400, 400, 3]], 1: [1, 200, 200, 3]} # Three models, only first and third need shapes: input_shapes = {0: [[1, 300, 300, 3]], 2: [1, 224, 224, 3]} # Shorthand for one model with two inputs: input_shapes = [[1, 300, 300, 3], [1, 400, 400, 3]]
- hpoclist of ints
Optional list of final layer output channels to increase precision. If HPOC is specified the model must have a single output OR each output must share the same number of OUTPUT CHANNELS. If more flexibility is required (multi-model and/or multi-output) use hpoc_file argument.
- hpoc_filepath to json or dict
Optional JSON file used to increase precison for the specified output channels. For more details see formats. Dictionary containing the layer name and channel indices in the format described above may also be passed as a valid input to the API.
- wbtablepath to json or dict
Optional JSON file with per-layer weight quantization information. Valid precision values are 4, 8, and 16-bit. You can set global precision (i.e, for ALL layers) with the help of __DEFAULT__ JSON tag. For more details see formats. Dictionary containing the layer names along with the precision values in the format described above may also be passed as a valid input to the API.
- exp_auto_dpbool
Experimental double-precision where 16 bit weights are applied to auto-selected layers while other layers use 8 bit weights. This argument is overriden if
wbtable
is specified.- extensionsstr or Path
List of paths to
.nce
files which can be used to extend the Neural Compiler functionality at runtime. All official extensions distributed by MemryX will be signed with a key.- allow_unsigned_extensionsbool
Allows running of unsigned extensions. An extension may execute arbitrary code, only run extensions from sources you trust! Defaults to
False
.
- run()#
Run the Neural Compiler.
Runs the Neural Compiler with the current configuration. It will convert the model(s) into a single dataflow program which can be used to program MXAs or simulated using the Simulator.
Note
The
models
arg must be configured before calling run().- Returns:
- dfpDfp object
See DFP for more details.
- Raises:
- CompilerError, OperatorError, ResourceError
Examples
from tensorflow import keras mobilenet = keras.applications.MobileNet() resnet50 = keras.applications.ResNet50() from memryx import NeuralCompiler # Compile MobileNet to 1 chip nc = NeuralCompiler(models=mobilenet, num_chips=1) dfp = nc.run() # Compile ResNet50 to 4 chips dfp = NeuralCompiler(models=resnet50, num_chips=4).run() # Compile MobileNet+ResNet50 to 4 chips dfp = NeuralCompiler(models=[mobilenet,resnet50], num_chips=4).run() # Compile Mobilenet but crop with inputs/outputs argument inputs = mobilenet.layers[3].name outputs = mobilenet.layers[5].name nc = NeuralCompiler(models=mobilenet, inputs=inputs, outputs=outputs).run()
- set_config(**kwargs)#
Configure the Neural Compiler.
Configure the Neural Compiler with the keyword arguments listed in the __init__ function.
- Parameters:
- **kwargs
Keyword args used to configure the Neural Compiler.
Examples
nc = NeuralCompiler() nc.set_config(num_chips=4, chip_gen="mx3") nc.set_config(model=mobilenet)
- reset_config()#
Reset config.
Reset configuration to the default values that the Neural Compiler was configured with.
Examples
nc = NeuralCompiler() nc.set_config(effort='hard') print(nc.get_config()['effort']) # Config reset nc.reset_config() print(nc.get_config()['effort'])
outputs:
>> 'hard' >> 'normal'
- get_config()#
Return the current config.
Get a dictionary of current Neural Compiler configuration.
- Returns:
- Configdict
Dictionary of Neural Compiler configurations.
Examples
nc = NeuralCompiler() print(nc.get_config())
outputs:
{'models': [None], 'num_chips': 4, 'input_shapes': [], 'effort': 'normal', 'inputs': None, 'outputs': None, 'model_in_out': None, 'autocrop': False, 'target_fps': float('inf'), 'dfp_fname': None, 'verbose': 0, 'show_optimization': False, 'hpoc': None, 'hpoc_file': None, 'wbtable': None, 'exp_auto_dp': False}