C++ Benchmark#
The acclBench command line interface tool provides an easy way to measure the FPS for single and multi-stream scenarios, using the high performance MxAccl C++ API.
After successful installation of driver and runtime libraries, the tool can be run using acclBench command.
acclBench -h
Usage: acclBench [options]
Options:
-h | --help Print this message
-H | --hello Check connection to MXA devices and get device info
-d | --dfp filename DFP model file to test, such as model/single_ssd_mobilenet_300_MX3.dfp
-m | --multistream Run accl bench for multistream
-n | --numstreams Number of streams to run multistream accl bench, default= 1 for singlestream 2 if multistream is chosen
-c | --convert_threads Number of feature map format conversion threads, default= 1
-g | --group Accerator group ID, default=0
-f | --frames Number of frame for testing inference performance, default=1000 secs
-s | --shared_mode Use Shared Mode (run DFP on mxa_manager instead of directly accessing hardware)
-a | --server_addr Address to mxa_manager (can be local or remote), default=/run/mxa_manager/
-p | --server_port Base port for mxa_manager connection, default=10000
-u | --use_model_shape Include any transposes/reshapes needed to match the model shapes. May impact performance.
-v | --verbose print all the required logs
--max_fps maximum allowed FPS per stream
--no_latency skip latency measurement step
--power get power consumption data during benchmark (on supported SKUs)
--pressure show Pressure level (flow utilization) during benchmark
--iw number of input pre-processing workers per model
--ow number of output post-processing workers per model
--device_ids MXA device IDs to be used to run benchmark, used in cases of multi device use cases. Takes in a comma separated list of device IDs
--mt Runs benchmark tool with Manual Threading model of c++ API
--set_freq Force use of this MX3 frequency for this command, options = {200,300,400,450,500,600,700,750,800,850} MHz
--frame_limit Number of frames for SchedulerOptions. Default is -1, which means all frames will be processed.
Usage#
The benchmark requires a compiled DFP, which is generated by the Neural Compiler. For a quick start using the neural compiler, please refer to Hello, Mobilenet!.
All arguments are included in the --help output above, and the following, in particular, are worth highlighting.
Arguments#
Common options:
Option |
Description |
|---|---|
-h, --help |
Shows the help message and exists |
-H, --hello |
Check connections to MXA devices and print info |
-d, --dfp |
Filename of the DFP file to test, required argument and no default value is set |
-f, --frames |
Number of frames for testing inference performance, default = 1000 frames |
-n, --numstreams |
Number of input streams to run multistream accl bench, default = 1 |
--device_ids |
Set the MXA device IDs to be used to run benchmark. Takes in a comma separated list of device IDs. Default: 0 |
-s, --shared_mode |
Use Shared Mode, which runs the DFP through mxa_manager, allowing for multi-process and multi-DFP support. |
Note
DFP filename (
-d) is a requried argument and if a DFP file is not specified,acclBenchwill exit and print the help message.
Advanced options:
Option |
Description |
|---|---|
-u, --use_model_shape |
Include any transposes/reshapes needed to match the model’s shapes. Gives a more accurate indication of full-application performance, which may be lower than raw DFP inference performance on some hosts. |
-a, --server_addr |
Address to mxa_manager, either a socketfile folder path or IP address, default=/run/mxa_manager/ (Linux) or default=127.0.0.1 (Windows) |
-p, --server_port |
Base port for mxa_manager connection, default=10000 |
--pressure |
Show Pressure level (flow utilization) during benchmark. Result will be “low”, “medium”, “high”, or “full”. |
--power |
IF SUPPORTED (preproduction SKUs only): Get power consumption data during benchmark. |
--max_fps |
Set the maximum allowed FPS per stream. Useful for testing specific model FPS targets and observing temperature, power, and pressure. |
--set_freq |
Force use of this MX3 frequency for this command, options = {200,300,400,450,500,600,700,750,800,850} MHz. Helps with performance/power/temp characterization at different frequencies. |
--no_latency |
Skip latency measurement step. For very slow models, latency measurement might add a long “warm up” period to the start of the benchmark tool. Use this if you just want to see FPS. |
Examples#
You can get a quick estimate of the FPS of your model by letting the acclBench benchmark tool use randomly generated data (of the correct size) to run inference.
The following are usage examples for different scenarios.
Single input stream (typical use case):
acclBench -d model.dfp -f 1000
Multiple input streams, here using 2 streams:
acclBench -d model.dfp -f 1000 -n 2
Multiple MXA devices with automatic load balancing:
acclBench -d model.dfp -f 1000 --device_ids 0,1,2
Shared Mode:
acclBench -d model.dfp -f 1000 -s