Accelerator API (Python)#

The Python Accelerator runtime API is separated into 3 distinct classes.

  • AsyncAccl is the primary Python API intended for realtime applications.

  • SyncAccl is for simple testing or batch processing of offline data.

  • MultiStreamAsyncAccl is an extension of AsyncAccl to make multi-camera scenarios easier to develop for.

class AsyncAccl(dfp, group_id=0, **kwargs)#

This class provides an asynchronous API to run models on the MemryX hardware accelerator. The user provides callback functions to feed data and receive outputs from the accelerator, which are then called whenever a model is ready to accept/output data. This pipelines execution of the models and allows the accelerator to run at full speed.

Parameters:
dfp: bytes or string

Path to dfp or a dfp object (bytearray). The dfp is generated by the NeuralCompiler.

group_id: int

The index of the chip group to select.

mxserver_addr: string

Address to use for mx_server connection. Only needed for Docker.

mxserver_port: int

Port to use for mx_server connection. Only useful for Docker.

Examples

import tensorflow as tf
import numpy as np
from memryx import NeuralCompiler, AsyncAccl

# define the callback that will return model input data
def data_source():
    for i in range(10):
        img = np.random.rand(224,224,3).astype(np.float32)
        data = tf.keras.applications.mobilenet.preprocess_input(img)
        yield data

# define the callback that will process the outputs of the model
def output_processor(*outputs):
    logits = np.squeeze(outputs[0], 0)
    preds = tf.keras.applications.mobilenet.decode_predictions(logits)

# Compile a MobileNet model for testing.
# Typically, comilation need to be done one time only.
model = tf.keras.applications.MobileNet()
nc = NeuralCompiler(models=model)
dfp = nc.run()

# Accelerate using the MemryX hardware
accl = AsyncAccl(dfp)
accl.connect_input(data_source) # starts asynchronous execution of input generating callback
accl.connect_output(output_processor) # starts asynchronous execution of output processing callback
accl.wait() # wait for the accelerator to finish execution
connect_input(callback, model_idx=0)#

Sets a callback function to execute when this model is ready to start processing an input frame.

Parameters:
callback: callable

This callable is invoked asynchonously whenever the accelerator is ready to start processing an input frame through the model specified by model_idx. callback is responsible for generating and returning the next input frame(s) for this model. callback must not take any arguments and it may return either a single np.ndarray if the model has a single input, or a sequence of np.ndarrays for multi-input models. The data types of the np.ndarrays must match those expected by the model.

Any exception raised when calling callback is taken to signal the end of the input stream for this model. Returning None from the callback is also taken to signal the end of input stream for this model.

If a pre-processing model was set by the calling the set_preprocessing method, the outputs of callback will first be run through the pre-processing model and resulting outputs will be fed to the accelerator.

model_idx: int

Index of the model whose input should be connected to callback

Raises:
RuntimeError: If the signature of callback contains any paramters
connect_output(callback, model_idx=0)#

Sets a callback function to execute when the outputs of this model are ready.

Parameters:
callback: callable

This callable is invoked asynchonously whenever the accelerator finishes processing an input frame for the model specified by model_idx. The outputs of the model are passed to this callable according to the port order assigned to the model, which is returned by the outport_assignment method. The signature of callback must only consist of parameters that correspond to the model output feature maps, no other parameter may be present.

If a post-processing model was set by the calling the set_postprocessing method, the outputs of the model will first be run through the post-processing model and resulting outputs will be used instead to call callback.

model_idx: int

Index of the model whose output should be connected to callback

stop()#

Send a signal to stop each of the models running on the accelerator. This call blocks until each of the models stops and cleans up its resources.

wait()#

Make the main thread wait for the accelerator to finish executing all models.

Raises:
RuntimeError: If the any of the model’s inputs/outputs are left unconnected
set_preprocessing_model(model_or_path, model_idx=0)#

Supply the path to a model/file that should be run to pre-process the input feature map. This is an optional feature that can be used to automatically run the pre-processing model output by the NeuralCompiler

Note

This function currently does not support PyTorch models

Parameters:
model_or_path: obj or str

Can be either an already loaded model such as a tf.keras.Model object for Keras, or a str path to a model file.

model_idx: int

Index of the model on the accelerator whose input feature map should be pre-processed by the supplied model

set_postprocessing_model(model_or_path, model_idx=0)#

Supply the path to a model/file that should be run to post-process the output feature maps This is an optional feature that can be used to automatically run the post-processing model output by the NeuralCompiler

Note

This function currently does not support PyTorch models

Parameters:
model_or_path: obj or str

Can be either an already loaded model such as a tf.keras.Model object for Keras, or a string path to a model file.

model_idx: int

Index of the model on the accelerator whose output should be post-processed by the supplied model

inport_assignment(model_idx=0)#

Returns a dictionary which maps input port ids to model input layer names for the model specified by model_idx

Parameters:
model_idx: int

Index of the model whose input port assignment is returned

outport_assignment(model_idx=0)#

Returns a dictionary which maps output port ids to model output layer names for the model specified by model_idx

Parameters:
model_idx: int

Index of the model whose output port assignment is returned

class SyncAccl(dfp, group_id=0, **kwargs)#

This class provides a synchronous API for the MemryX hardware accelerator, which performs input and output sequentially per model. The accelerator is abstracted as a collection of models. You can select the desired model specifying its index to the member function.

Parameters:
dfp: bytes or string

Path to dfp or a dfp object (bytearray). The dfp is generated by the NeuralCompiler.

group_id: int

The index of the chip group to select.

mxserver_addr: string

Address to use for mx_server connection. Only needed for Docker.

mxserver_port: int

Port to use for mx_server connection. Only useful for Docker.

Examples

import tensorflow as tf
import numpy as np
from memryx import NeuralCompiler, SyncAccl

# Compile a MobileNet model for testing.
# Typically, comilation need to be done one time only.
model = tf.keras.applications.MobileNet()
nc = NeuralCompiler(models=model)
dfp = nc.run()

# Prepare the input data
img = np.random.rand(224,224,3).astype(np.float32)
data = tf.keras.applications.mobilenet.preprocess_input(img)

# Accelerate using the MemryX hardware
accl = SyncAccl(dfp)
outputs = accl.run(data) # Run sequential acceleration on the input data.

Warning

MemryX accelerator is a streaming processor that the user can supply with pipelined input data. Using the synchronous API to perform sequential execution of multiple input frames may result in a significant performance penalty. The user is advised to use the send/receive functions on separate threads or to use the asynchronous API interface.

send(data, model_idx=0, timeout=None)#

For the model specified by model_idx, this function transfers input data to the accelerator. It copies the data to the buffer(s) of the model’s input port(s) and returns. If there is no space in the buffer(s), the call blocks for a period decided by timeout.

Parameters:
data: np.ndarray or sequence of np.ndarray

Typically the pre-processed input data array(s) of the model

model_idx: int

Index of the model to which the data should be sent

timeout: int

The number of milliseconds to block if there is no space in the port buffer(s). If set to None (default), blocks until there is space, otherwise blocks for at most timeout milliseconds and raises an error if still there is no space.

Raises:
TimeoutError:

When no data is available at the port buffer(s) after blocking for timeout (> 0) milliseconds

receive(model_idx=0, timeout=None)#

For the model specified by model_idx, this function collects the output data from the accelerator. It retrieves data from the selected model’s output buffer(s). If data is unavailable at any of the model output ports, the function call gets blocked for the specified timeout milliseconds.

Parameters:
model_idx: int

Index of the model from which the data should be read

timeout: int

The number of milliseconds to block if no data is avaialable at the port buffer(s). If set to None (default), blocks until data is available, otherwise blocks for at most timeout milliseconds and raises an error if data is still unavailable.

Raises:
TimeoutError:

When no data is available at the port buffer(s) after blocking for timeout (> 0) milliseconds

run(inputs, model_idx=0)#

This function combines send and receive in one sequential call. It sends the data to the specified model and waits until all the outputs are available to retrieve.

Parameters:
model_idx: int

Index of the model

inputs: np.ndarray or a list of np.ndarray or a doubly nested list of np.ndarray

Typically the pre-processed input data array(s) of the model. Each np.ndarray’s shape must match the shape expected by the model. Multiple sets of inputs can be batched together for greater performance compared to running a single set of inputs at a time due to internal pipelining. Stacking together multiple sets of inputs into a single np.ndarray is not supported.

inport_assignment(model_idx=0)#

Returns a dictionary which maps input port ids to model input layer names for the model specified by model_idx

Parameters:
model_idx: int

Index of the model whose input port assignment is returned

outport_assignment(model_idx=0)#

Returns a dictionary which maps output port ids to model output layer names for the model specified by model_idx

Parameters:
model_idx: int

Index of the model whose output port assignment is returned

class MultiStreamAsyncAccl(dfp, group_id=0, stream_workers=None, **kwargs)#

This class provides a multi-stream version of the AsyncAccl API. This allows multiple input+output callbacks to be associated with a single model.

Parameters:
dfp: bytes or string

Path to dfp or a dfp object (bytearray). The dfp is generated by the NeuralCompiler.

group_id: int

The index of the chip group to select.

mxserver_addr: string

Address to use for mx_server connection. Only needed for Docker.

mxserver_port: int

Port to use for mx_server connection. Only useful for Docker.

Examples

import numpy as np
import tensorflow as tf
from memryx import NeuralCompiler, MultiStreamAsyncAccl

class Application:
    # define the callback that will return model input data
    def __init__(self):
        self.streams = []
        self.streams_idx = []
        self.outputs = []
        for i in range(2):
            self.streams.append([np.random.rand(224,224,3).astype(np.float32) for i in range(10)])
            self.streams_idx.append(0)
            self.outputs.append([])

    def data_source(self, stream_idx):
        # generate inputs based on stream_idx
        if self.streams_idx[stream_idx] == len(self.streams[stream_idx]):
            return None
        self.streams_idx[stream_idx]+=1
        return self.streams[stream_idx][self.streams_idx[stream_idx]-1]

    # define the callback that will process the outputs of the model
    def output_processor(self, stream_idx, *outputs):
        logits = np.squeeze(outputs[0], 0)
        preds = tf.keras.applications.mobilenet.decode_predictions(logits)
        # route outputs based on stream_idx
        self.outputs[stream_idx].append(preds)

# Compile a MobileNet model for testing.
# Typically, comilation need to be done one time only.
model = tf.keras.applications.MobileNet()
nc = NeuralCompiler(models=model,verbose=1)
dfp = nc.run()

# Accelerate using the MemryX hardware
app = Application()
accl = MultiStreamAsyncAccl(dfp)
accl.connect_streams(app.data_source, app.output_processor, 2) # starts asynchronous execution of input output callback pair associated with 2 streams
accl.wait() # wait for the accelerator to finish execution
connect_streams(input_callback, output_callback, stream_count, model_idx=0)#

Registers and starts execution of a pair of input and output callback functions that processes stream_count number of data sources

Parameters:
input_callback: callable

A function/bound method that returns the input data to consumed by the model identified by model_idx. It must have exactly one parameter stream_idx which is the index from 0 to stream_count - 1 for the model at model_idx which is used in the application code to distinguish/select the appropriate data source from which the data is returned

output_callback: callable

A function/bound method that is called with the output feature maps generated by the model. It must have at least 2 parameters: stream_idx and either a packed *fmaps or fmap0, fmap1, … depending on the number of output feature maps generated by the model

stream_count: int

The number of input feature map sources/streams to the model.

model_idx: int

The target model for this pair of input and output callback functions. Each model will have its own stream_idx in the range of 0 to stream_count - 1

stop()#

Sends a signal to stop each of the models running on the accelerator. This call blocks until each of the models stops and cleans up its resources.

wait()#

Blocks the application thread until the accelerator finishes executing all models.

Raises:
RuntimeError: If the any of the model’s inputs/outputs are left unconnected