C++ Benchmark#
The acclBench command line interface tool provides an easy way to measure the FPS for single and multi-stream scenarios, using the high performance MxAccl C++ API.
After successfull installation of driver and runtime libraries, the CLI tool can be run using acclBench
command.
acclBench -h
Options:
-d | --dfp filename DFP model file to test, such as 'model/single_ssd_mobilenet_300_MX3.dfp'
-m | --multistream Run accl bench for multistream
-n | --numstreams Number of streams to run multistream accl bench, default= 1 for singlestream 2 if multistream is chosen
-c | --convert_threads Number of feature map format conversion threads, default= 1
-h | --help Print this message
-g | --group Accerator group ID, default=0
-f | --frames Number of frame for testing inference performance, default=1000 secs
-v | --verbose print all the required logs
--max_fps maximum allowed FPS per stream
--iw number of input pre-processing workers per model
--ow number of output post-processing workers per model
--device_ids MXA device IDs to be used to run benchmark, used in cases of multi device use cases. Takes in a comma separated list of device IDs
--ls Allows lenient setup in multi device use cases, uses available devices in case if some of the passed IDs are not available.
--mt Runs benchmark tool with Manual Threading model of c++ API.
Usage#
The benchmark requires a compiled DFP, which is generated by the Neural Compiler. For a quick start using the neural compiler, please refer to Hello, Mobilenet!.
Arguments#
Option |
Description |
---|---|
-h, --help |
show this help message and exit |
-v, --verbose |
print all log messages |
-d, --dfp |
filename of DFP model file to test, required argument and no default value is set |
-m, --multistream |
run accl bench for multistream input, if given runs bench with 2 input streams by default |
-n, --numstreams |
number of input streams to run multistream accl bench, default = 1 for singlestream, 2 if multistream is chosen with -m option |
-f, --frames |
number of frames for testing inference performance, default = 1000 frames |
-g, --group |
accelartor group or device ID, used when multiple MXA device are connected to a host |
--iw |
set the number of input pre-processing workers per model. Default is set by C++ api if not specified: either the number of streams or 1/2 of the # CPUs on the system. |
--ow |
set the number of output post-processing workers per model. Default is set by C++ api if not specified: either the number of streams or 1/2 of the # CPUs on the system. |
--device_ids |
set the MXA device IDs to be used to run benchmark, used in cases of multi device use cases. Takes in a comma separated list of device IDs |
--ls |
Allows lenient setup in multi device use cases, uses available devices in case if some of the passed IDs are not available. |
--mt |
Runs benchmark tool with Manual Threading model of c++ API |
Note
DFP filename is a requried argument and if a DFP file is not specified,
acclBench
will exit and print the help message.
Benchmark with random input data#
You can get a quick estimate of the FPS of your model by letting the Benchmark use randomly generated data (of the correct size) to run inference. This saves you the hassel of generating feature maps for the accelerator to consume.
For single input stream benchmark
acclBench -d model.dfp -f 1000
For multiple input stream benchmark, here using 2 streams:
acclBench -d model.dfp -f 1000 -n 2
For Manual Threading mode benchmark:
acclBench -d model.dfp -f 1000 --mt
For multiple MXA device based benchmark:
acclBench -d model.dfp -f 1000 --device_ids 0,1,2