C++ Benchmark#

The acclBench command line interface tool provides an easy way to measure the FPS for single and multi-stream scenarios, using the high performance MxAccl C++ API.

After successful installation of driver and runtime libraries, the tool can be run using acclBench command.

acclBench -h

Usage: acclBench [options]

Options:
-h | --help            Print this message
-H | --hello           Check connection to MXA devices and get device info
-d | --dfp filename    DFP model file to test, such as model/single_ssd_mobilenet_300_MX3.dfp
-m | --multistream     Run accl bench for multistream
-n | --numstreams      Number of streams to run multistream accl bench, default= 1 for singlestream 2 if multistream is chosen
-c | --convert_threads Number of feature map format conversion threads, default= 1
-g | --group           Accerator group ID, default=0
-f | --frames          Number of frame for testing inference performance, default=1000 secs
-s | --shared_mode     Use Shared Mode (run DFP on mxa_manager instead of directly accessing hardware)
-a | --server_addr     Address to mxa_manager (can be local or remote), default=/run/mxa_manager/
-p | --server_port     Base port for mxa_manager connection, default=10000
-u | --use_model_shape Include any transposes/reshapes needed to match the model shapes. May impact performance.
-v | --verbose         print all the required logs
--max_fps              maximum allowed FPS per stream
--no_latency           skip latency measurement step
--power                get power consumption data during benchmark (on supported SKUs)
--pressure             show Pressure level (flow utilization) during benchmark
--iw                   number of input pre-processing workers per model
--ow                   number of output post-processing workers per model
--device_ids           MXA device IDs to be used to run benchmark, used in cases of multi device use cases. Takes in a comma separated list of device IDs
--mt                   Runs benchmark tool with Manual Threading model of c++ API
--set_freq             Force use of this MX3 frequency for this command, options = {200,300,400,450,500,600,700,750,800,850} MHz
--frame_limit          Number of frames for SchedulerOptions. Default is -1, which means all frames will be processed.

Usage#

The benchmark requires a compiled DFP, which is generated by the Neural Compiler. For a quick start using the neural compiler, please refer to Hello, Mobilenet!.

All arguments are included in the --help output above, and the following, in particular, are worth highlighting.

Arguments#

Common options:

Option

Description

-h, --help

Shows the help message and exists

-H, --hello

Check connections to MXA devices and print info

-d, --dfp

Filename of the DFP file to test, required argument and no default value is set

-f, --frames

Number of frames for testing inference performance, default = 1000 frames

-n, --numstreams

Number of input streams to run multistream accl bench, default = 1

--device_ids

Set the MXA device IDs to be used to run benchmark. Takes in a comma separated list of device IDs. Default: 0

-s, --shared_mode

Use Shared Mode, which runs the DFP through mxa_manager, allowing for multi-process and multi-DFP support.

Note

  • DFP filename (-d) is a requried argument and if a DFP file is not specified, acclBench will exit and print the help message.

Advanced options:

Option

Description

-u, --use_model_shape

Include any transposes/reshapes needed to match the model’s shapes. Gives a more accurate indication of full-application performance, which may be lower than raw DFP inference performance on some hosts.

-a, --server_addr

Address to mxa_manager, either a socketfile folder path or IP address, default=/run/mxa_manager/ (Linux) or default=127.0.0.1 (Windows)

-p, --server_port

Base port for mxa_manager connection, default=10000

--pressure

Show Pressure level (flow utilization) during benchmark. Result will be “low”, “medium”, “high”, or “full”.

--power

IF SUPPORTED (preproduction SKUs only): Get power consumption data during benchmark.

--max_fps

Set the maximum allowed FPS per stream. Useful for testing specific model FPS targets and observing temperature, power, and pressure.

--set_freq

Force use of this MX3 frequency for this command, options = {200,300,400,450,500,600,700,750,800,850} MHz. Helps with performance/power/temp characterization at different frequencies.

--no_latency

Skip latency measurement step. For very slow models, latency measurement might add a long “warm up” period to the start of the benchmark tool. Use this if you just want to see FPS.

Examples#

You can get a quick estimate of the FPS of your model by letting the acclBench benchmark tool use randomly generated data (of the correct size) to run inference.

The following are usage examples for different scenarios.

Single input stream (typical use case):

acclBench -d model.dfp -f 1000

Multiple input streams, here using 2 streams:

acclBench -d model.dfp -f 1000 -n 2

Multiple MXA devices with automatic load balancing:

acclBench -d model.dfp -f 1000 --device_ids 0,1,2

Shared Mode:

acclBench -d model.dfp -f 1000 -s