Python Benchmark#

Note

The Benchmark APIs provides a simple way to characterize the neural network model performance on the MX3 chips. For integration, it’s advised to use the Accelerator API.

class Benchmark(verbose=0, dfp='model.dfp', group=0, chip_gen=3.1, mxserver_addr='localhost', mxserver_port=10000, **kwargs)#

MemryX Benchmark.

The interface to the MemryX Benchmark. This class wraps driver calls to provide a user friendly expereince using MXA(s) which are connected via PCIe/USB.

Parameters:
verboseint

How verbose the benchmark will be.

dfpstring or pathlib.Path or Dfp

Path to dfp or a Dfp object

Examples

from memryx import Benchmark

# Run inference with 1000 frames of random data
with Benchmark(dfp='mobilenet.dfp') as accl:
    outputs,_,fps = accl.run(frames=1000)
    print("FPS {:.2f}".format(fps))
run(inputs=None, frames=500, threading=True)#

Run inference on the benchmark.

Perform inference using the configured DFP on the connected MXA with the given inputs or random data if no inputs are specified.

Parameters:
inputsnp.array() or list of np.array()

The array shape should be [N + input_shape] where N is the number of frames to run. If there are multiple inputs to the DFP (e.g. multi-model) then a list of (appropriately shaped) numpy arrays.

framesint

Number of frames to run. The number of frames in ‘inputs’ args will override this. (defaults to 500).

Warning

Having too few frames may give unreliable results. Try to use a number that will cause your model to run for at least a few seconds.

threadingbool

Use threading to send / recieve frame. This will allow frames to be pipelined on the accelerator which enables higher FPS. Otherwise a blocking scheme to send / receive frames will be used.

Note

You must use threading=False in order to measure latency. On the other hand, use threading=True to get the best FPS.

Returns:
outputs, latency, fpsnp.array() or list of np.array(), float, float

Produces the NN output (only when inputs is provided) and reports the latency and/or FPS. The output data is returned as a list of np.array(). The arrays will have the shape [N + output_shape] for each output where N is the number of inference frames. When inputs is None, outputs will also be None since random inputs lead to random output feature maps.

Note

The latency is timed from the moment the first data of the input is consumed until the last data of the output produced. The FPS is calculated as the time between output frames.

Note

You must use threading=False in order to measure latency. On the other hand, use threading=True to get the best FPS.

Raises:
ValueError

When the input is incorrectly configured.

Examples

with Benchmark(dfp='mobilenet.dfp') as accl:
    # 1000 frames, get FPS
    outputs,_,fps = accl.run(frames=1000)

    # single frame, get latency
    outputs,latency,_ = accl.run(threading=False)

    # four frames of a numpy array's data (assume 224x224x3 model input)
    inputs = np.zeros([4,224,224,3])
    outputs,latency,fps = accl.run(inputs=inputs)