Python Benchmark#

Note

The Benchmark APIs provides a simple way to characterize the neural network model performance on the MX3 chips. For integration, it’s advised to use the Accelerator API.

class Benchmark(verbose=0, dfp='model.dfp', use_model_shape=False, device_ids=[0], local_mode=True, manager_addr='/run/mxa_manager/', manager_port=10000, **kwargs)#

MemryX Benchmark.

The interface to the MemryX Benchmark.

This class wraps around SyncAccl to provide a user friendly expereince using MXA(s) which are connected via PCIe/USB.

Parameters:

verboseint: How verbose the benchmark will be.
dfpstring or pathlib.Path or Dfp: Path to dfp or a Dfp object

Examples

from memryx import Benchmark

# Run inference with 1000 frames of random data
with Benchmark(dfp='mobilenet.dfp') as accl:
    outputs,_,fps = accl.run(frames=1000)
    print("FPS {:.2f}".format(fps))

run(inputs=None, frames=500, threading=True, model_idx=0)#

Run inference on the benchmark.

Perform inference using the configured DFP on the connected MXA with the given inputs or random data if no inputs are specified.

Parameters:

inputsnp.array() or list of np.array(): The array shape should be [N + input_shape] where N is the number of frames to run. If there are multiple inputs to the DFP (e.g. multi-model) then a list of (appropriately shaped) numpy arrays.
framesint: Number of frames to run. The number of frames in ‘inputs’ args will override this. (defaults to 500).

Warning

Having too few frames may give unreliable results. Try to use a number that will cause your model to run for at least a few seconds.
threadingbool: Use threading to send / recieve frame. This will allow frames to be pipelined on the accelerator which enables higher FPS. Otherwise a blocking scheme to send / receive frames will be used.

Note

You must use threading=False in order to measure latency. On the other hand, use threading=True to get the best FPS.

Returns:

outputs, latency, fpsnp.array() or list of np.array(), float, float: Produces the NN output (only when inputs is provided) and reports the latency and/or FPS. The output data is returned as a list of np.array(). The arrays will have the shape [N + output_shape] for each output where N is the number of inference frames. When inputs is None, outputs will also be None since random inputs lead to random output feature maps.

Note

The latency is timed from the moment the first data of the input is consumed until the last data of the output produced. The FPS is calculated as the time between output frames.

Note

You must use threading=False in order to measure latency. On the other hand, use threading=True to get the best FPS.

Raises:

ValueError: When the input is incorrectly configured.

Examples

Single model example:

with Benchmark(dfp=’mobilenet.dfp’) as accl:
# 1000 frames, get FPS outputs,_,fps = accl.run(frames=1000, model_idx=0)

# single frame, get latency outputs,latency,_ = accl.run(threading=False, model_idx=0)

# four frames of a numpy array’s data (assume 1x224x224x3 model input) inputs = np.zeros([4,1,224,224,3]) outputs,latency,fps = accl.run(inputs=inputs, model_idx=0)

Multi-model example:

# model 0’s input is 4 frames of 1x224x224x3 # model 1’s input is 5 frames of 1x300x300x3

inputs = [np.random.random([4,1,224,224,3]), np.random.random(5,1,300,300,3)]

with Benchmark(dfp=’models.dfp’) as accl:

for model_idx in range(num_models):
# 1000 frames, get FPS outputs,_,fps = accl.run(frames=1000, model_idx=model_idx)

# single frame, get latency outputs,latency,_ = accl.run(threading=False, model_idx=model_idx)

outputs,latency,fps = accl.run(inputs=inputs[model_idx], model_idx=model_idx)