Multi-DFP#

Starting in SDK 2.0, the MemryX runtime supports using multiple DFPs in a single application, or across multiple applications.

Though the hardware can only execute one DFP at a time, the MXA-Manager is able to queue up input/ouput data for multiple DFPs and “swap” the executing DFP.

The end result is that applications can appear to run simultaneously, even though the hardware is only executing one DFP at a time.

Important

If your models are able to co-map into a single DFP, you should always try to do so for best performance.

Co-mapped DFPs can even support applications accessing contained models separately, in full parallel.

Scheduling multiple DFPs is for scenarios where co-mapping models in advance is not possible.

With the runtime APIs, using multiple DFPs is very simple; however, extracting the best performance requires some manual tuning.

Hint

An upcoming SDK release will include a new mxa-manager feature: “smart” scheduling, which will automatically optimize DFP scheduling config during runtime.

Using Multiple DFPs#

To use multiple DFPs, you simply create multiple Accelerator objects, each with its own DFP file.

Same Application#

Python

# Create Accelerator objects for each DFP
accl1 = AsyncAccl("model1.dfp")
accl2 = AsyncAccl("model2.dfp")

# Connect callbacks as needed
accl1.connect_input(input_callback1)
accl1.connect_output(output_callback1)
accl2.connect_input(input_callback2)
accl2.connect_output(output_callback2)

C++

// Create Accelerator objects for each DFP
MxAccl accl1("model1.dfp");
MxAccl accl2("model2.dfp");

// Connect callbacks as needed
accl1.connect_stream(&input_callback1, &output_callback1);
accl2.connect_stream(&input_callback2, &output_callback2);

Multiple Applications#

To use different DFPs in different applications and have them run simultaneously, you can simply start multiple processes, each with its own Accelerator object.

As long as the default args are used (Shared Mode), the MXA-Manager will handle the scheduling of these DFPs across the available hardware.

Improving Performance#

Although the default parameters will work, the scheduler config will need you try different values and find what works best for your specific application.

The following parameters can be adjusted for your DFPs.

DFP Scheduler Options#

When submitting a DFP to the MXA-Manager (i.e., creating an Accelerator object), you can specify a scheduler config to control how the DFP is executed.

These options are:

frame_limit: The number of frames to process through this DFP (including all submodels) before yielding control back for another DFP to be scheduled. Default is 20
time_limit: If this many milliseconds pass without a new input for the running DFP, it will yield control back so another DFP can be run. Default is 250 (ms)
stop_on_empty: Whether to immediately swap out if the ifmap queue is empty. This is functionally equivalent to setting time_limit to a very small number. Default is False
ifmap_queue_size: Number of entries in the input feature map queue, shared among all client processes. Default is 16
ofmap_queue_size: Number of entries in the output feature map queues, private for each client process. Default is 21

Important

The first application (or Acclerator object) to submit a DFP will set its config options. Subsequent applications submitting the same DFP (identified by SHA512 hash) will have their config options ignored.

Client Options#

DFP config options are shared across all client applications using the same DFP, but each client app also has unique options that can differ per-client.

Frame Smoothing#

When the MXA-Manager has multiple DFPs to run, it will accumulate input frames from clients for non-executing DFPs, and then swap in the next DFP when the current one yields control.

From a client app’s perspective, this will look like the DFP is stopped for a short time, then a rapid burst of frames are processed, then the DFP is stopped again.

While this may be acceptable for many applications, it can lead to a “jittery” frame rate, which may not be desirable for real-time applications.

This is where per-client options for “smoothing” come into play. These options are:

smoothing: If True, the frame rate will be smoothed to the specified FPS. Default is False (no smoothing)
fps_target: The target FPS to smooth to. Must be specified, no default value allowed.

Usage#

Python

from memryx import AsyncAccl
from memryx.accl import ClientOptions, SchedulerOptions

# Swap after 24 frames, or after 50ms of no input
my_dfp_opts = SchedulerOptions(
    frame_limit=24,
    time_limit=100,
    stop_on_empty=False,
    ifmap_queue_size=16,
    ofmap_queue_size=12
)

# Target 30 FPS
my_client_opts = ClientOptions(
    smoothing=True,
    fps_target=30
)

# Create Accelerator object
accl = AsyncAccl("model1.dfp",
    scheduler_options=my_dfp_opts,
    client_options=my_client_opts)

C++

// Swap after 24 frames, or after 50ms of no input,
// and target 30 FPS
MxAccl accl("model1.dfp",
    {0},                     // MXA Device ID to use (0 for first device)
    {true, true},            // use_model_shape for {input, output}
    false,                   // local_mode = false (use shared mode)
    {24, 50, false, 16, 12}, // SchedulerOptions
    {true, 30});             // ClientOptions

See the C++ API reference for more details on the constructor parameters.

Tuning Advice#

While your particular combination of DFPs and FPS requirements will determine the best config options, here are some general tips:

Use a frame_limit that is just slightly under your cumulative FPS target for that DFP.
- For example, one client at 30 FPS could use frame_limit 20, while two clients at 30 FPS could use a frame_limit of ~50.
Have ifmap_queue_size be less than frame_limit * number_of_clients.
Use ofmap_queue_size equal to frame_limit + 1, or slightly higher if clients are slow to process output frames.
Try to avoid needing a time_limit and be sure clients send at a constant rate.

Hint

If you’re only using a single DFP (with either 1 or more client apps), just use frame_limit=0 and time_limit=5000. The DFP will run as long as you don’t have >5 seconds without any input.

Important

Each combination of DFPs and clients will likely need different tuning for peak performance. So before deploying your application, be sure to test it alongside all other expected DFPs and apps.

In this release, you can’t adjust the DFP config options after the DFP is started. So to adjust on-the-fly, you’ll need to first stop all clients using that DFP, then restart them with the new config options.