C++ Benchmark#

The acclBench command line interface tool provides an easy way to measure the FPS for single and multi-stream scenarios, using the high performance MxAccl C++ API.

After successful installation of driver and runtime libraries, the tool can be run using acclBench command.

acclBench -h

Options:
-h | --help            Print this message
-H | --hello           Check connection to MXA devices and get device info
-d | --dfp filename    DFP model file to test, such as 'model/single_ssd_mobilenet_300_MX3.dfp'
-n | --numstreams      Number of parallel streams to benchmark, default=1
-m | --multistream     Run accl bench for multistream (equivalent to -n 2)
-c | --convert_threads Number of feature map format conversion threads, default=1
-g | --group           Accelerator group ID, default=0. Advanced use only.
-f | --frames          Number of frame for testing inference performance, default=1000 secs
-s | --server_addr     Address to mxa_manager (can be local or remote), default=localhost
-p | --server_port     Base port for mxa_manager connection, default=10000
-r | --shared_mode     Use Shared Mode (run DFP on mxa_manager instead of directly accessing hardware)
-v | --verbose         print all the required logs
--max_fps              maximum allowed FPS per stream
--iw                   number of input pre-processing workers per model
--ow                   number of output post-processing workers per model
--device_ids           MXA device IDs to be used to run benchmark, used in cases of multi device use cases. Takes in a comma separated list of device IDss
--mt                   Runs benchmark tool with Manual Threading model of C++ API
--no_copy              Use no-copy mode, which can improve performance on low-end systems

Usage#

The benchmark requires a compiled DFP, which is generated by the Neural Compiler. For a quick start using the neural compiler, please refer to Hello, Mobilenet!.

All arguments are included in the --help output above, and the following, in particular, are worth highlighting.

Arguments#

Option	Description
-h, --help	Shows the help message and exists
-H, --hello	Check connections to MXA devices and print info
-v, --verbose	Print all log messages during benchmarking
-d, --dfp	Filename of the DFP file to test, required argument and no default value is set
-n, --numstreams	Number of input streams to run multistream accl bench, default = 1 for singlestream, 2 if multistream is chosen with -m option
-f, --frames	Number of frames for testing inference performance, default = 1000 frames
-s, --server_addr	Address to mxa_manager (can be local or network IP/hostname), default=localhost
-p, --server_port	Base port for mxa_manager connection, default=10000
-r, --shared_mode	Use Shared Mode, which runs the DFP through mxa_manager instead of directly accessing hardware, allowing for multiple parallel instances of acclBench
--iw	Set the number of input pre-processing workers per model. Default: either the number of streams or 1/2 of the # CPUs on the system, whichever is smaller
--ow	Set the number of output post-processing workers per model. Default: either the number of streams or 1/2 of the # CPUs on the system, whichever is smaller
--device_ids	Set the MXA device IDs to be used to run benchmark. Takes in a comma separated list of device IDs. Default: 0
--no_copy	Use no-copy mode, which can increase performance if your application is written to support it

Note

DFP filename (-d) is a requried argument and if a DFP file is not specified, acclBench will exit and print the help message.

Examples#

You can get a quick estimate of the FPS of your model by letting the acclBench benchmark tool use randomly generated data (of the correct size) to run inference.

The following are usage examples for different scenarios.

Single input stream (typical use case):

acclBench -d model.dfp -f 1000

Multiple input streams, here using 2 streams:

acclBench -d model.dfp -f 1000 -n 2

Multiple MXA devices with automatic load balancing:

acclBench -d model.dfp -f 1000 --device_ids 0,1,2

Shared Mode:

acclBench -d model.dfp -f 1000 -r

No-Copy mode:

acclBench -d model.dfp -f 1000 --no_copy