Python Benchmark#
Note
The Benchmark APIs provides a simple way to characterize the neural network model performance on the MX3 chips. For integration, it’s advised to use the Accelerator API.
- class Benchmark(verbose=0, dfp='model.dfp', group=0, chip_gen=3.1, **kwargs)#
MemryX Benchmark.
The interface to the MemryX Benchmark. This class wraps driver calls to provide a user friendly expereince using MXA(s) which are connected via PCIe/USB.
- Parameters:
- verboseint
How verbose the benchmark will be.
- dfpstring or pathlib.Path or Dfp
Path to dfp or a Dfp object
Examples
from memryx import Benchmark # Run inference with 1000 frames of random data with Benchmark(dfp='mobilenet.dfp') as accl: outputs,_,fps = accl.run(frames=1000) print("FPS {.2f}".format(fps))
- run(inputs=None, frames=500, threading=True)#
Run inference on the benchmark.
Perform inference using the configured DFP on the connected MXA with the given inputs or random data if no inputs are specified.
- Parameters:
- inputsnp.array() or list of np.array()
The array shape should be [N + input_shape] where N is the number of frames to run. If there are multiple inputs to the DFP (e.g. multi-model) then a list of (appropriately shaped) numpy arrays.
- framesint
Number of frames to run. The number of frames in ‘inputs’ args will override this. (defaults to 500).
Warning
Having too few frames may give unreliable results. Try to use a number that will cause your model to run for at least a few seconds.
- threadingbool
Use threading to send / recieve frame. This will allow frames to be pipelined on the accelerator which enables higher FPS. Otherwise a blocking scheme to send / receive frames will be used.
Note
You must use threading=False in order to measure latency. On the other hand, use threading=True to get the best FPS.
- Returns:
- outputs, latency, fpsnp.array() or list of np.array(), float, float
Produces the NN output (only when inputs is provided) and reports the latency and/or FPS. The output data is returned as a list of np.array(). The arrays will have the shape [N + output_shape] for each output where N is the number of inference frames. When inputs is None, outputs will also be None since random inputs lead to random output feature maps.
Note
The latency is timed from the moment the first data of the input is consumed until the last data of the output produced. The FPS is calculated as the time between output frames.
Note
You must use threading=False in order to measure latency. On the other hand, use threading=True to get the best FPS.
- Raises:
- ValueError
When the input is incorrectly configured.
Examples
with Benchmark(dfp='mobilenet.dfp') as accl: # 1000 frames, get FPS outputs,_,fps = accl.run(frames=1000) # single frame, get latency outputs,latency,_ = accl.run(threading=False) # four frames of a numpy array's data (assume 224x224x3 model input) inputs = np.zeros([4,224,224,3]) outputs,latency,fps = accl.run(inputs=inputs)