Accelerator API (Python)#
The Python Accelerator runtime API is separated into 3 distinct classes.
AsyncAccl is the primary Python API intended for realtime applications.
SyncAccl is for simple testing or batch processing of offline data.
MultiStreamAsyncAccl is an extension of AsyncAccl to make multi-camera scenarios easier to develop for.
- class AsyncAccl(dfp, group_id=0, **kwargs)#
This class provides an asynchronous API to run models on the MemryX hardware accelerator. The user provides callback functions to feed data and receive outputs from the accelerator, which are then called whenever a model is ready to accept/output data. This pipelines execution of the models and allows the accelerator to run at full speed.
- Parameters:
- dfp: bytes or string
Path to dfp or a dfp object (bytearray). The dfp is generated by the NeuralCompiler.
- group_id: int
The index of the chip group to select.
- mxserver_addr: string
Address to use for mx_server connection. Only needed for Docker.
- mxserver_port: int
Port to use for mx_server connection. Only useful for Docker.
Examples
import tensorflow as tf import numpy as np from memryx import NeuralCompiler, AsyncAccl # define the callback that will return model input data def data_source(): for i in range(10): img = np.random.rand(224,224,3).astype(np.float32) data = tf.keras.applications.mobilenet.preprocess_input(img) yield data # define the callback that will process the outputs of the model def output_processor(*outputs): logits = np.squeeze(outputs[0], 0) preds = tf.keras.applications.mobilenet.decode_predictions(logits) # Compile a MobileNet model for testing. # Typically, comilation need to be done one time only. model = tf.keras.applications.MobileNet() nc = NeuralCompiler(models=model) dfp = nc.run() # Accelerate using the MemryX hardware accl = AsyncAccl(dfp) accl.connect_input(data_source) # starts asynchronous execution of input generating callback accl.connect_output(output_processor) # starts asynchronous execution of output processing callback accl.wait() # wait for the accelerator to finish execution
- connect_input(callback, model_idx=0)#
Sets a callback function to execute when this model is ready to start processing an input frame.
- Parameters:
- callback: callable
This callable is invoked asynchonously whenever the accelerator is ready to start processing an input frame through the model specified by model_idx. callback is responsible for generating and returning the next input frame(s) for this model. callback must not take any arguments and it may return either a single np.ndarray if the model has a single input, or a sequence of np.ndarrays for multi-input models. The data types of the np.ndarrays must match those expected by the model.
Any exception raised when calling callback is taken to signal the end of the input stream for this model. Returning None from the callback is also taken to signal the end of input stream for this model.
If a pre-processing model was set by the calling the set_preprocessing method, the outputs of callback will first be run through the pre-processing model and resulting outputs will be fed to the accelerator.
- model_idx: int
Index of the model whose input should be connected to callback
- Raises:
- RuntimeError: If the signature of callback contains any paramters
- connect_output(callback, model_idx=0)#
Sets a callback function to execute when the outputs of this model are ready.
- Parameters:
- callback: callable
This callable is invoked asynchonously whenever the accelerator finishes processing an input frame for the model specified by model_idx. The outputs of the model are passed to this callable according to the port order assigned to the model, which is returned by the outport_assignment method. The signature of callback must only consist of parameters that correspond to the model output feature maps, no other parameter may be present.
If a post-processing model was set by the calling the set_postprocessing method, the outputs of the model will first be run through the post-processing model and resulting outputs will be used instead to call callback.
- model_idx: int
Index of the model whose output should be connected to callback
- stop()#
Send a signal to stop each of the models running on the accelerator. This call blocks until each of the models stops and cleans up its resources.
- wait()#
Make the main thread wait for the accelerator to finish executing all models.
- Raises:
- RuntimeError: If the any of the model’s inputs/outputs are left unconnected
- set_preprocessing_model(model_or_path, model_idx=0)#
Supply the path to a model/file that should be run to pre-process the input feature map. This is an optional feature that can be used to automatically run the pre-processing model output by the NeuralCompiler
Note
This function currently does not support PyTorch models
- Parameters:
- model_or_path: obj or str
Can be either an already loaded model such as a tf.keras.Model object for Keras, or a str path to a model file.
- model_idx: int
Index of the model on the accelerator whose input feature map should be pre-processed by the supplied model
- set_postprocessing_model(model_or_path, model_idx=0)#
Supply the path to a model/file that should be run to post-process the output feature maps This is an optional feature that can be used to automatically run the post-processing model output by the NeuralCompiler
Note
This function currently does not support PyTorch models
- Parameters:
- model_or_path: obj or str
Can be either an already loaded model such as a tf.keras.Model object for Keras, or a string path to a model file.
- model_idx: int
Index of the model on the accelerator whose output should be post-processed by the supplied model
- inport_assignment(model_idx=0)#
Returns a dictionary which maps input port ids to model input layer names for the model specified by model_idx
- Parameters:
- model_idx: int
Index of the model whose input port assignment is returned
- outport_assignment(model_idx=0)#
Returns a dictionary which maps output port ids to model output layer names for the model specified by model_idx
- Parameters:
- model_idx: int
Index of the model whose output port assignment is returned
- class SyncAccl(dfp, group_id=0, **kwargs)#
This class provides a synchronous API for the MemryX hardware accelerator, which performs input and output sequentially per model. The accelerator is abstracted as a collection of models. You can select the desired model specifying its index to the member function.
- Parameters:
- dfp: bytes or string
Path to dfp or a dfp object (bytearray). The dfp is generated by the NeuralCompiler.
- group_id: int
The index of the chip group to select.
- mxserver_addr: string
Address to use for mx_server connection. Only needed for Docker.
- mxserver_port: int
Port to use for mx_server connection. Only useful for Docker.
Examples
import tensorflow as tf import numpy as np from memryx import NeuralCompiler, SyncAccl # Compile a MobileNet model for testing. # Typically, comilation need to be done one time only. model = tf.keras.applications.MobileNet() nc = NeuralCompiler(models=model) dfp = nc.run() # Prepare the input data img = np.random.rand(224,224,3).astype(np.float32) data = tf.keras.applications.mobilenet.preprocess_input(img) # Accelerate using the MemryX hardware accl = SyncAccl(dfp) outputs = accl.run(data) # Run sequential acceleration on the input data.
Warning
MemryX accelerator is a streaming processor that the user can supply with pipelined input data. Using the synchronous API to perform sequential execution of multiple input frames may result in a significant performance penalty. The user is advised to use the send/receive functions on separate threads or to use the asynchronous API interface.
- send(data, model_idx=0, timeout=None)#
For the model specified by model_idx, this function transfers input data to the accelerator. It copies the data to the buffer(s) of the model’s input port(s) and returns. If there is no space in the buffer(s), the call blocks for a period decided by timeout.
- Parameters:
- data: np.ndarray or sequence of np.ndarray
Typically the pre-processed input data array(s) of the model
- model_idx: int
Index of the model to which the data should be sent
- timeout: int
The number of milliseconds to block if there is no space in the port buffer(s). If set to None (default), blocks until there is space, otherwise blocks for at most timeout milliseconds and raises an error if still there is no space.
- Raises:
- TimeoutError:
When no data is available at the port buffer(s) after blocking for timeout (> 0) milliseconds
- receive(model_idx=0, timeout=None)#
For the model specified by model_idx, this function collects the output data from the accelerator. It retrieves data from the selected model’s output buffer(s). If data is unavailable at any of the model output ports, the function call gets blocked for the specified timeout milliseconds.
- Parameters:
- model_idx: int
Index of the model from which the data should be read
- timeout: int
The number of milliseconds to block if no data is avaialable at the port buffer(s). If set to None (default), blocks until data is available, otherwise blocks for at most timeout milliseconds and raises an error if data is still unavailable.
- Raises:
- TimeoutError:
When no data is available at the port buffer(s) after blocking for timeout (> 0) milliseconds
- run(inputs, model_idx=0)#
This function combines send and receive in one sequential call. It sends the data to the specified model and waits until all the outputs are available to retrieve.
- Parameters:
- model_idx: int
Index of the model
- inputs: np.ndarray or a list of np.ndarray or a doubly nested list of np.ndarray
Typically the pre-processed input data array(s) of the model. Each np.ndarray’s shape must match the shape expected by the model. Multiple sets of inputs can be batched together for greater performance compared to running a single set of inputs at a time due to internal pipelining. Stacking together multiple sets of inputs into a single np.ndarray is not supported.
- inport_assignment(model_idx=0)#
Returns a dictionary which maps input port ids to model input layer names for the model specified by model_idx
- Parameters:
- model_idx: int
Index of the model whose input port assignment is returned
- outport_assignment(model_idx=0)#
Returns a dictionary which maps output port ids to model output layer names for the model specified by model_idx
- Parameters:
- model_idx: int
Index of the model whose output port assignment is returned
- class MultiStreamAsyncAccl(dfp, group_id=0, stream_workers=None, **kwargs)#
This class provides a multi-stream version of the AsyncAccl API. This allows multiple input+output callbacks to be associated with a single model.
- Parameters:
- dfp: bytes or string
Path to dfp or a dfp object (bytearray). The dfp is generated by the NeuralCompiler.
- group_id: int
The index of the chip group to select.
- mxserver_addr: string
Address to use for mx_server connection. Only needed for Docker.
- mxserver_port: int
Port to use for mx_server connection. Only useful for Docker.
Examples
import numpy as np import tensorflow as tf from memryx import NeuralCompiler, MultiStreamAsyncAccl class Application: # define the callback that will return model input data def __init__(self): self.streams = [] self.streams_idx = [] self.outputs = [] for i in range(2): self.streams.append([np.random.rand(224,224,3).astype(np.float32) for i in range(10)]) self.streams_idx.append(0) self.outputs.append([]) def data_source(self, stream_idx): # generate inputs based on stream_idx if self.streams_idx[stream_idx] == len(self.streams[stream_idx]): return None self.streams_idx[stream_idx]+=1 return self.streams[stream_idx][self.streams_idx[stream_idx]-1] # define the callback that will process the outputs of the model def output_processor(self, stream_idx, *outputs): logits = np.squeeze(outputs[0], 0) preds = tf.keras.applications.mobilenet.decode_predictions(logits) # route outputs based on stream_idx self.outputs[stream_idx].append(preds) # Compile a MobileNet model for testing. # Typically, comilation need to be done one time only. model = tf.keras.applications.MobileNet() nc = NeuralCompiler(models=model,verbose=1) dfp = nc.run() # Accelerate using the MemryX hardware app = Application() accl = MultiStreamAsyncAccl(dfp) accl.connect_streams(app.data_source, app.output_processor, 2) # starts asynchronous execution of input output callback pair associated with 2 streams accl.wait() # wait for the accelerator to finish execution
- connect_streams(input_callback, output_callback, stream_count, model_idx=0)#
Registers and starts execution of a pair of input and output callback functions that processes stream_count number of data sources
- Parameters:
- input_callback: callable
A function/bound method that returns the input data to consumed by the model identified by model_idx. It must have exactly one parameter stream_idx which is the index from 0 to stream_count - 1 for the model at model_idx which is used in the application code to distinguish/select the appropriate data source from which the data is returned
- output_callback: callable
A function/bound method that is called with the output feature maps generated by the model. It must have at least 2 parameters: stream_idx and either a packed *fmaps or fmap0, fmap1, … depending on the number of output feature maps generated by the model
- stream_count: int
The number of input feature map sources/streams to the model.
- model_idx: int
The target model for this pair of input and output callback functions. Each model will have its own stream_idx in the range of 0 to stream_count - 1
- stop()#
Sends a signal to stop each of the models running on the accelerator. This call blocks until each of the models stops and cleans up its resources.
- wait()#
Blocks the application thread until the accelerator finishes executing all models.
- Raises:
- RuntimeError: If the any of the model’s inputs/outputs are left unconnected