Multi-DFP#
Starting in SDK 2.0, the MemryX runtime supports using multiple DFPs in a single application, or across multiple applications.
Though the hardware can only execute one DFP at a time, the MXA-Manager is able to queue up input/ouput data for multiple DFPs and “swap” the executing DFP.
The end result is that applications can appear to run simultaneously, even though the hardware is only executing one DFP at a time.
Important
If your models are able to co-map into a single DFP, you should always try to do so for best performance.
Co-mapped DFPs can even support applications accessing contained models separately, in full parallel.
Scheduling multiple DFPs is for scenarios where co-mapping models in advance is not possible.
With the runtime APIs, using multiple DFPs is very simple; however, extracting the best performance requires some manual tuning.
Hint
An upcoming SDK release will include a new mxa-manager
feature: “smart” scheduling, which will automatically optimize DFP scheduling config during runtime.
Using Multiple DFPs#
To use multiple DFPs, you simply create multiple Accelerator objects, each with its own DFP file.
Same Application#
# Create Accelerator objects for each DFP
accl1 = AsyncAccl("model1.dfp")
accl2 = AsyncAccl("model2.dfp")
# Connect callbacks as needed
accl1.connect_input(input_callback1)
accl1.connect_output(output_callback1)
accl2.connect_input(input_callback2)
accl2.connect_output(output_callback2)
// Create Accelerator objects for each DFP
MxAccl accl1("model1.dfp");
MxAccl accl2("model2.dfp");
// Connect callbacks as needed
accl1.connect_stream(&input_callback1, &output_callback1);
accl2.connect_stream(&input_callback2, &output_callback2);
Multiple Applications#
To use different DFPs in different applications and have them run simultaneously, you can simply start multiple processes, each with its own Accelerator object.
As long as the default args are used (Shared Mode), the MXA-Manager will handle the scheduling of these DFPs across the available hardware.
Improving Performance#
Although the default parameters will work, the scheduler config will need you try different values and find what works best for your specific application.
The following parameters can be adjusted for your DFPs.
DFP Scheduler Options#
When submitting a DFP to the MXA-Manager (i.e., creating an Accelerator object), you can specify a scheduler config to control how the DFP is executed.
These options are:
frame_limit: The number of frames to process through this DFP (including all submodels) before yielding control back for another DFP to be scheduled. Default is 600
time_limit: If this many milliseconds pass without a new input for the running DFP, it will yield control back so another DFP can be run. Default is 0 (disabled, always run until frame_limit)
stop_on_empty: Whether to immediately swap out if the ifmap queue is empty. This is functionally equivalent to setting time_limit to a very small number. Default is False
ifmap_queue_size: Number of entries in the input feature map queue, shared among all client processes. Default is 16
ofmap_queue_size: Number of entries in the output feature map queues, private for each client process. Default is 12
Important
The first application (or Acclerator object) to submit a DFP will set its config options. Subsequent applications submitting the same DFP (identified by SHA512 hash) will have their config options ignored.
Client Options#
DFP config options are shared across all client applications using the same DFP, but each client app also has unique options that can differ per-client.
Frame Smoothing#
When the MXA-Manager has multiple DFPs to run, it will accumulate input frames from clients for non-executing DFPs, and then swap in the next DFP when the current one yields control.
From a client app’s perspective, this will look like the DFP is stopped for a short time, then a rapid burst of frames are processed, then the DFP is stopped again.
While this may be acceptable for many applications, it can lead to a “jittery” frame rate, which may not be desirable for real-time applications.
This is where per-client options for “smoothing” come into play. These options are:
smoothing: If True, the frame rate will be smoothed to the specified FPS. Default is False (no smoothing)
fps_target: The target FPS to smooth to. Must be specified, no default value allowed.
Usage#
from memryx import AsyncAccl
from memryx.accl import ClientOptions, SchedulerOptions
# Swap after 24 frames, or after 50ms of no input
my_dfp_opts = SchedulerOptions(
frame_limit=24,
time_limit=100,
stop_on_empty=False,
ifmap_queue_size=16,
ofmap_queue_size=12
)
# Target 30 FPS
my_client_opts = ClientOptions(
smoothing=True,
fps_target=30
)
# Create Accelerator object
accl = AsyncAccl("model1.dfp",
scheduler_options=my_dfp_opts,
client_options=my_client_opts)
// Swap after 24 frames, or after 50ms of no input,
// and target 30 FPS
MxAccl accl("model1.dfp",
{0}, // MXA Device ID to use (0 for first device)
{true, true}, // use_model_shape for {input, output}
false, // local_mode = false (use shared mode)
{24, 50, false, 16, 12}, // SchedulerOptions
{true, 30}); // ClientOptions
See the C++ API reference for more details on the constructor parameters.
Tuning Advice#
While your particular combination of DFPs and FPS requirements will determine the best config options, here are some general tips:
Use a
frame_limit
that is just slightly under your cumulative FPS target for that DFP. For example, two clients at 30 FPS could use a frame_limit of ~50.If the DFP is already low FPS (as determined by using a benchmark tool), use a smaller
frame_limit
to allow other DFPs in the schedule to run more often.Large DFPs (size in MB) with lots of model weights will take longer to swap in and out, so use higher
frame_limit
values to recover FPS.Bigger
ifmap_queue_size
andofmap_queue_size
values will help smooth out frame rate jitter, but will gradually increase latency.Increase
ifmap_queue_size
if there are many clients for this same DFP.Don’t set a
_queue_size
larger thanframe_limit * (number of clients)
. This may cause large latencies.
Hint
If you’re only using a single DFP (with either 1 or more client apps), just use frame_limit=0
and time_limit=5000
. The DFP will run as long as you don’t have >5 seconds without any input.
Important
Each combination of DFPs and clients will likely need different tuning for peak performance. So before deploying your application, be sure to test it alongside all other expected DFPs and apps.
In this release, you can’t adjust the DFP config options after the DFP is started. So to adjust on-the-fly, you’ll need to first stop all clients using that DFP, then restart them with the new config options.