C++ Benchmark#

The acclBench command line interface tool provides an easy way to measure the FPS for single and multi-stream scenarios, using the high performance MxAccl C++ API.

After successfull installation of driver and runtime libraries, the CLI tool can be run using acclBench command.

acclBench -h

Options:
-d | --dfp filename    DFP model file to test, such as 'model/single_ssd_mobilenet_300_MX3.dfp'
-m | --multistream     Run accl bench for multistream
-n | --numstreams      Number of streams to run multistream accl bench, default= 1 for singlestream 2 if multistream is chosen
-c | --convert_threads Number of feature map format conversion threads, default= 1
-h | --help            Print this message
-g | --group           Accerator group ID, default=0
-f | --frames          Number of frame for testing inference performance, default=1000 secs
-v | --verbose         print all the required logs
--max_fps              maximum allowed FPS per stream
--iw                   number of input pre-processing workers per model
--ow                   number of output post-processing workers per model
--device_ids           MXA device IDs to be used to run benchmark, used in cases of multi device use cases. Takes in a comma separated list of device IDs
--ls                   Allows lenient setup in multi device use cases, uses available devices in case if some of the passed IDs are not available.
--mt                   Runs benchmark tool with Manual Threading model of c++ API.

Usage#

The benchmark requires a compiled DFP, which is generated by the Neural Compiler. For a quick start using the neural compiler, please refer to Hello, Mobilenet!.

Arguments#

Option

Description

-h, --help

show this help message and exit

-v, --verbose

print all log messages

-d, --dfp

filename of DFP model file to test, required argument and no default value is set

-m, --multistream

run accl bench for multistream input, if given runs bench with 2 input streams by default

-n, --numstreams

number of input streams to run multistream accl bench, default = 1 for singlestream, 2 if multistream is chosen with -m option

-f, --frames

number of frames for testing inference performance, default = 1000 frames

-g, --group

accelartor group or device ID, used when multiple MXA device are connected to a host

--iw

set the number of input pre-processing workers per model. Default is set by C++ api if not specified: either the number of streams or 1/2 of the # CPUs on the system.

--ow

set the number of output post-processing workers per model. Default is set by C++ api if not specified: either the number of streams or 1/2 of the # CPUs on the system.

--device_ids

set the MXA device IDs to be used to run benchmark, used in cases of multi device use cases. Takes in a comma separated list of device IDs

--ls

Allows lenient setup in multi device use cases, uses available devices in case if some of the passed IDs are not available.

--mt

Runs benchmark tool with Manual Threading model of c++ API

Note

  • DFP filename is a requried argument and if a DFP file is not specified, acclBench will exit and print the help message.

Benchmark with random input data#

You can get a quick estimate of the FPS of your model by letting the Benchmark use randomly generated data (of the correct size) to run inference. This saves you the hassel of generating feature maps for the accelerator to consume.

For single input stream benchmark

acclBench -d model.dfp -f 1000

For multiple input stream benchmark, here using 2 streams:

acclBench -d model.dfp -f 1000 -n 2

For Manual Threading mode benchmark:

acclBench -d model.dfp -f 1000 --mt

For multiple MXA device based benchmark:

acclBench -d model.dfp -f 1000 --device_ids 0,1,2