C++ Benchmark#

The acclBench command line interface tool provides an easy way to measure the FPS for single and multi-stream scenarios, using the high performance MxAccl C++ API.

After successful installation of driver and runtime libraries, the tool can be run using acclBench command.

acclBench -h

Options:
-h | --help            Print this message
-H | --hello           Check connection to MXA devices and get device info
-d | --dfp filename    DFP model file to test, such as 'model/single_ssd_mobilenet_300_MX3.dfp'
-n | --numstreams      Number of parallel streams to benchmark, default=1
-m | --multistream     Run accl bench for multistream (equivalent to -n 2)
-c | --convert_threads Number of feature map format conversion threads, default=1
-g | --group           Accelerator group ID, default=0. Advanced use only.
-f | --frames          Number of frame for testing inference performance, default=1000 secs
-s | --server_addr     Address to mx_server (can be local or remote), default=localhost
-p | --server_port     Base port for mx_server connection, default=10000
-r | --shared_mode     Use Shared Mode (run DFP on mx_server instead of directly accessing hardware)
-v | --verbose         print all the required logs
--max_fps              maximum allowed FPS per stream
--iw                   number of input pre-processing workers per model
--ow                   number of output post-processing workers per model
--device_ids           MXA device IDs to be used to run benchmark, used in cases of multi device use cases. Takes in a comma separated list of device IDss
--mt                   Runs benchmark tool with Manual Threading model of C++ API
--no_copy              Use no-copy mode, which can improve performance on low-end systems

Usage#

The benchmark requires a compiled DFP, which is generated by the Neural Compiler. For a quick start using the neural compiler, please refer to Hello, Mobilenet!.

All arguments are included in the --help output above, and the following, in particular, are worth highlighting.

Arguments#

Option

Description

-h, --help

Shows the help message and exists

-H, --hello

Check connections to MXA devices and print info

-v, --verbose

Print all log messages during benchmarking

-d, --dfp

Filename of the DFP file to test, required argument and no default value is set

-n, --numstreams

Number of input streams to run multistream accl bench, default = 1 for singlestream, 2 if multistream is chosen with -m option

-f, --frames

Number of frames for testing inference performance, default = 1000 frames

-s, --server_addr

Address to mx_server (can be local or network IP/hostname), default=localhost

-p, --server_port

Base port for mx_server connection, default=10000

-r, --shared_mode

Use Shared Mode, which runs the DFP through mx_server instead of directly accessing hardware, allowing for multiple parallel instances of acclBench

--iw

Set the number of input pre-processing workers per model. Default: either the number of streams or 1/2 of the # CPUs on the system, whichever is smaller

--ow

Set the number of output post-processing workers per model. Default: either the number of streams or 1/2 of the # CPUs on the system, whichever is smaller

--device_ids

Set the MXA device IDs to be used to run benchmark. Takes in a comma separated list of device IDs. Default: 0

--no_copy

Use no-copy mode, which can increase performance if your application is written to support it

Note

  • DFP filename (-d) is a requried argument and if a DFP file is not specified, acclBench will exit and print the help message.

Examples#

You can get a quick estimate of the FPS of your model by letting the acclBench benchmark tool use randomly generated data (of the correct size) to run inference.

The following are usage examples for different scenarios.

Single input stream (typical use case):

acclBench -d model.dfp -f 1000

Multiple input streams, here using 2 streams:

acclBench -d model.dfp -f 1000 -n 2

Multiple MXA devices with automatic load balancing:

acclBench -d model.dfp -f 1000 --device_ids 0,1,2

Shared Mode:

acclBench -d model.dfp -f 1000 -r

No-Copy mode:

acclBench -d model.dfp -f 1000 --no_copy