Multiple MXA Modules#

When you have multiple MXA modules (multiple M.2 or PCIe cards) in the same system, you can effortlessly balance DFP execution across them by changing config paramters.

Note

Each accelerator module (M.2 card) has multiple MXA chips on it, but is always treated as one accelerator device. Individual MXA chips are not directly addressable.

For example, the typical M.2 2280 card has 4 MX3 chips – to the MemryX SDK, this is like “an accelerator module of size 4”. Meanwhile, a card with 2 or 8 chips would be treated as an accelerator module of “size 2” or “size 8”.

This guide is for using multiple modules.

Config Usage#

By default, runtimes will use Device 0 (first MXA module). To use a different device, or multiple devices, just provide a list to the device IDs parameter.

Python

# Use devices 0, 1, and 2 (auto load balancing)
accl012 = AsyncAccl("my_boosted_model.dfp", device_ids=[0, 1, 2])

# Use only device 3 (only one device)
accl3 = AsyncAccl("my_separate_model.dfp", device_ids=[3])

C++

// Use devices 0, 1, and 2 (auto load balancing)
MxAccl accl012("my_boosted_model.dfp", {0, 1, 2});

// Use only device 3 (only one device)
MxAccl accl3("my_separate_model.dfp", {3});

And that’s it!. The runtime will automatically split the workload across the specified devices.

Hint

Streams are automatically balanced as well. There’s no need to manually assign streams to devices.

Performance Notes#

When using multiple devices, you should generally expect a linear increase in FPS with the number of devices being used.

However, note that performance bottlenecks may arise elsewhere in your application, such as:

Video Decoding
Pre & Post Processing
Stream Workers and CPU Load
DFP Scheduler Options