C++ Benchmark#
The acclBench command line interface tool provides an easy way to measure the FPS for single and multi-stream scenarios, using the high performance MxAccl C++ API.
After successful installation of driver and runtime libraries, the tool can be run using acclBench
command.
acclBench -h
Options:
-h | --help Print this message
-H | --hello Check connection to MXA devices and get device info
-d | --dfp filename DFP model file to test, such as 'model/single_ssd_mobilenet_300_MX3.dfp'
-n | --numstreams Number of parallel streams to benchmark, default=1
-m | --multistream Run accl bench for multistream (equivalent to -n 2)
-c | --convert_threads Number of feature map format conversion threads, default=1
-g | --group Accelerator group ID, default=0. Advanced use only.
-f | --frames Number of frame for testing inference performance, default=1000 secs
-s | --server_addr Address to mx_server (can be local or remote), default=localhost
-p | --server_port Base port for mx_server connection, default=10000
-r | --shared_mode Use Shared Mode (run DFP on mx_server instead of directly accessing hardware)
-v | --verbose print all the required logs
--max_fps maximum allowed FPS per stream
--iw number of input pre-processing workers per model
--ow number of output post-processing workers per model
--device_ids MXA device IDs to be used to run benchmark, used in cases of multi device use cases. Takes in a comma separated list of device IDss
--mt Runs benchmark tool with Manual Threading model of C++ API
--no_copy Use no-copy mode, which can improve performance on low-end systems
Usage#
The benchmark requires a compiled DFP, which is generated by the Neural Compiler. For a quick start using the neural compiler, please refer to Hello, Mobilenet!.
All arguments are included in the --help
output above, and the following, in particular, are worth highlighting.
Arguments#
Option |
Description |
---|---|
-h, --help |
Shows the help message and exists |
-H, --hello |
Check connections to MXA devices and print info |
-v, --verbose |
Print all log messages during benchmarking |
-d, --dfp |
Filename of the DFP file to test, required argument and no default value is set |
-n, --numstreams |
Number of input streams to run multistream accl bench, default = 1 for singlestream, 2 if multistream is chosen with -m option |
-f, --frames |
Number of frames for testing inference performance, default = 1000 frames |
-s, --server_addr |
Address to mx_server (can be local or network IP/hostname), default=localhost |
-p, --server_port |
Base port for mx_server connection, default=10000 |
-r, --shared_mode |
Use Shared Mode, which runs the DFP through mx_server instead of directly accessing hardware, allowing for multiple parallel instances of acclBench |
--iw |
Set the number of input pre-processing workers per model. Default: either the number of streams or 1/2 of the # CPUs on the system, whichever is smaller |
--ow |
Set the number of output post-processing workers per model. Default: either the number of streams or 1/2 of the # CPUs on the system, whichever is smaller |
--device_ids |
Set the MXA device IDs to be used to run benchmark. Takes in a comma separated list of device IDs. Default: 0 |
--no_copy |
Use no-copy mode, which can increase performance if your application is written to support it |
Note
DFP filename (
-d
) is a requried argument and if a DFP file is not specified,acclBench
will exit and print the help message.
Examples#
You can get a quick estimate of the FPS of your model by letting the acclBench benchmark tool use randomly generated data (of the correct size) to run inference.
The following are usage examples for different scenarios.
Single input stream (typical use case):
acclBench -d model.dfp -f 1000
Multiple input streams, here using 2 streams:
acclBench -d model.dfp -f 1000 -n 2
Multiple MXA devices with automatic load balancing:
acclBench -d model.dfp -f 1000 --device_ids 0,1,2
Shared Mode:
acclBench -d model.dfp -f 1000 -r
No-Copy mode:
acclBench -d model.dfp -f 1000 --no_copy