Accelerator API (C++)#

class MxAccl : public MX::Runtime::MxAcclBase#

Public Functions

inline MxAccl(const std::filesystem::path &dfp_path, std::vector<int> device_ids_to_use = {0}, std::array<bool, 2> use_model_shape = {true, true}, bool local_mode = false, SchedulerOptions sched_options = {600, 0, false, 16, 12}, ClientOptions client_options = {false, 0}, std::string server_addr = "/run/mxa_manager/", unsigned int server_port_base = 10000, bool ignore_server_ = false)#

All-in-one constructor that initializes the MxAccl object, loads the provided DFP, and applies runtime configuration options.

This constructor streamlines the setup of a MemryX accelerator instance by combining DFP loading, device selection, scheduler setup, and client/server configuration into a single step.

Warning

Setting ignore_server_ to true disables coordination with other processes and may result in device conflicts if multiple clients or containers access the same device concurrently. Use with caution.

Parameters:
  • dfp_path – Path to the compiled DFP (Dataflow Program) file. Accepts std::filesystem::path, const char*, or std::string.

  • device_ids_to_use – List of MXA device IDs to use for execution. Specify {-1} to indicate that all available devices should be used. Default is {0}.

  • use_model_shape – A pair of boolean flags {input, output} indicating whether to preserve the original model input/output shapes (true), or to use the shape as interpreted by the MXA runtime (false). Default is {true, true}.

  • local_mode – If set to true, the DFP runs in local (non-shared) mode. This mode may improve performance for single-process applications but does not support multi-process or multi-DFP usage. Default is false.

  • sched_options – Runtime scheduler configuration used when operating in shared mode. Includes frame limits, timeouts, and queue sizes. Default is {frame_limit = 600, timeout = 0, swap_on_empty = false, input_queue_size = 16, output_queue_size = 12}. See MX::RPC::SchedulerOptions for detailed field descriptions.

  • client_options – Client-specific execution options such as FPS smoothing and target frame rate. Default is {smoothing = false, fps_target = 0}. See MX::RPC::ClientOptions for detailed field descriptions.

  • server_addr – Server address or UNIX socket file path for manager communication. On Linux, the default is "/run/mxa_manager/" (UNIX socket). On Windows or remote setups, use an IP address (e.g., "localhost").

  • server_port_base – Base port number used for socket or IP-based communication. Applies to both socket filenames and IP addresses. Default is 10000.

  • ignore_server_ – (Advanced) If set to true, the MxAccl instance will ignore the manager server and operate in local mode only.

inline MxAccl(uint8_t *dfp_bytes, std::vector<int> device_ids_to_use = {0}, std::array<bool, 2> use_model_shape = {true, true}, bool local_mode = false, SchedulerOptions sched_options = {600, 0, false, 16, 12}, ClientOptions client_options = {false, 0}, std::string server_addr = "/run/mxa_manager/", unsigned int server_port_base = 10000, bool ignore_server_ = false)#

All-in-one constructor that initializes the MxAccl object, loads the provided DFP, and applies runtime configuration options.

This constructor streamlines the setup of a MemryX accelerator instance by combining DFP loading, device selection, scheduler setup, and client/server configuration into a single step.

Warning

Setting ignore_server_ to true disables coordination with other processes and may result in device conflicts if multiple clients or containers access the same device concurrently. Use with caution.

Parameters:
  • dfp_bytes – Raw uint8_t* pointer to memory containing the already-loaded DFP file data.

  • device_ids_to_use – List of MXA device IDs to use for execution. Specify {-1} to indicate that all available devices should be used. Default is {0}.

  • use_model_shape – A pair of boolean flags {input, output} indicating whether to preserve the original model input/output shapes (true), or to use the shape as interpreted by the MXA runtime (false). Default is {true, true}.

  • local_mode – If set to true, the DFP runs in local (non-shared) mode. This mode may improve performance for single-process applications but does not support multi-process or multi-DFP usage. Default is false.

  • sched_options – Runtime scheduler configuration used when operating in shared mode. Includes frame limits, timeouts, and queue sizes. Default is {frame_limit = 600, timeout = 0, swap_on_empty = false, input_queue_size = 16, output_queue_size = 12}. See MX::RPC::SchedulerOptions for detailed field descriptions.

  • client_options – Client-specific execution options such as FPS smoothing and target frame rate. Default is {smoothing = false, fps_target = 0}. See MX::RPC::ClientOptions for detailed field descriptions.

  • server_addr – Server address or UNIX socket file path for manager communication. On Linux, the default is "/run/mxa_manager/" (UNIX socket). On Windows or remote setups, use an IP address (e.g., "localhost").

  • server_port_base – Base port number used for socket or IP-based communication. Applies to both socket filenames and IP addresses. Default is 10000.

  • ignore_server_ – (Advanced) If set to true, the MxAccl instance will ignore the manager server and operate in local mode only.

void connect_stream(float_callback_t in_cb, float_callback_t out_cb, int stream_id, int model_id = 0)#

Connects a stream to a model using the specified input and output callback functions.

This method registers a data stream for the given model and binds it to both input and output callback functions. Streams are uniquely identified using stream_id. This function must be called before start() or after stop().

  • float_callback_t is a function pointer type with the following signature: bool foo(std::vector<const MX::Types::FeatureMap*>, int stream_id);

  • If the input callback (in_cb) returns false, the corresponding stream is stopped. When all registered streams have stopped, wait() is invoked automatically.

Parameters:
  • in_cb – Input callback function that supplies data to the model. Must match float_callback_t signature.

  • out_cb – Output callback function invoked with the model’s output feature maps. Must match float_callback_t signature.

  • stream_id – Unique identifier for this stream. It is passed to the callback functions to distinguish streams.

  • model_id – Index of the model to which this stream is connected. Default is 0.

void start(int model_id = -1)#

Starts execution for the specified model or models.

This function initiates processing for the given model ID. All streams associated with the model must be connected beforehand via connect_stream(). Use -1 to indicate that all available DFPs and models should be started.

Parameters:

model_id – Index of the model to start. Use -1 to start all models. Default is -1.

Pre:

connect_stream() must be called before this method.

void wait(int model_id = -1)#

Blocks until all streams for the specified model(s) have completed execution.

This function waits until all active input callbacks for the given model have returned false, indicating the end of their respective streams. It should only be called after start() has been invoked for the target model(s).

Use -1 to wait for all models and DFPs currently in execution.

Parameters:

model_id – Index of the model to wait on. Use -1 to wait on all active models. Default is -1.

void stop(int model_id = -1)#

Stops execution of the specified model or models.

This function halts all active streams and processing associated with the given model. It should only be called after start() has been invoked. Use -1 to stop all active models and DFPs.

Parameters:

model_id – Index of the model to stop. Use -1 to stop all models. Default is -1.

void set_num_workers(int input_num_workers, int output_num_workers, int model_id = 0)#

Configures the number of worker threads for input and output streams for a given model.

By default, the number of input and output workers is equal to the number of connected streams, which typically yields maximum performance. This method allows overriding that behavior to fine-tune parallelism.

This method should be called after all required streams have been connected, and before start() is invoked. If not explicitly set, the accelerator will operate using default worker counts.

Parameters:
  • input_num_workers – Number of worker threads to assign for input processing.

  • output_num_workers – Number of worker threads to assign for output processing.

  • model_id – Index of the model to apply the worker configuration to. Default is 0.

Pre:

connect_stream() must be called before this method

int get_num_streams(int model_id = 0)#

Returns the number of streams currently connected to the specified model.

This method can be used to query how many streams have been registered for a given model using connect_stream(). It is useful for debugging, monitoring, or configuring worker allocation.

Parameters:

model_id – Index of the model to query. Default is 0.

Returns:

int Number of streams connected to the specified model.

Private Types

typedef std::function<bool(vector<const MX::Types::FeatureMap*>, int stream_id)> float_callback_t#

Type definition for a callback function that processes input and output feature maps.

This callback is used to handle both input and output. For input callbacks, you should write your input data (a 1D float array (float*)) to the provided feature map objects with the set_data() method.

For output callbacks, the feature maps will contain the results of the model execution, which you should read into a 1D float array (float*) with get_data().

The stream_id parameter allows you to distinguish between different streams, which is useful when multiple streams are connected to the same model. The stream ID is passed to your function from the runtime’s internal scheduler, and it is the same ID you provided when calling connect_stream().

class MxAcclMT : public MX::Runtime::MxAcclBase#

Public Functions

inline MxAcclMT(const std::filesystem::path &dfp_path, std::vector<int> device_ids_to_use = {0}, std::array<bool, 2> use_model_shape = {true, true}, bool local_mode = false, SchedulerOptions sched_options = {600, 0, false, 16, 12}, ClientOptions client_options = {false, 0}, std::string server_addr = "/run/mxa_manager/", unsigned int server_port_base = 10000, bool ignore_server_ = false)#

All-in-one constructor that initializes the MxAcclMT object in manual threading mode, connects the provided DFP, and applies runtime configuration options.

This constructor sets up the accelerator by loading the specified DFP, configuring device assignments, scheduling behavior, and communication options. It builds on the functionality of the base class MxAcclBase.

Warning

Setting ignore_server_ to true disables coordination with other processes and may result in device conflicts if multiple clients or containers access the same device concurrently. Use with caution.

Parameters:
  • dfp_path – Path to the compiled DFP (Dataflow Program) file. Accepts std::filesystem::path, const char*, or std::string.

  • device_ids_to_use – List of MXA device IDs to use for execution. Specify {-1} to indicate that all available devices should be used. Default is {0}.

  • use_model_shape – A pair of boolean flags {input, output} indicating whether to preserve the original model input/output shapes (true) or use the shape interpreted by the MXA runtime (false). Default is {true, true}.

  • local_mode – If true, the DFP will be executed in local (non-shared) mode. This may offer better performance in single-process environments, but it disables multi-process and multi-DFP coordination. Default is false.

  • sched_options – Scheduler options used during shared execution mode. Configures frame limits, timeouts, and queue sizes. Default is {frame_limit = 600, timeout = 0, swap_on_empty = false, input_queue_size = 16, output_queue_size = 12}. See MX::RPC::SchedulerOptions for detailed field descriptions.

  • client_options – Client-side behavior options such as FPS smoothing and throttling. Default is {smoothing = false, fps_target = 0}. See MX::RPC::ClientOptions for detailed field descriptions.

  • server_addr – Address of the manager server. On Linux, this is typically a UNIX socket path (e.g., "/run/mxa_manager/"). On Windows or remote setups, this should be an IP address such as "localhost".

  • server_port_base – Base port number used by the manager server for communication. Applies to both socket filenames and IP-based addressing. Default is 10000.

  • ignore_server_ – (Advanced) If set to true, the object will bypass the manager server and run the DFP in local mode.

inline MxAcclMT(uint8_t *dfp_bytes, std::vector<int> device_ids_to_use = {0}, std::array<bool, 2> use_model_shape = {true, true}, bool local_mode = false, SchedulerOptions sched_options = {600, 0, false, 16, 12}, ClientOptions client_options = {false, 0}, std::string server_addr = "/run/mxa_manager/", unsigned int server_port_base = 10000, bool ignore_server_ = false)#

All-in-one constructor that initializes the MxAcclMT object in manual threading mode, connects the provided DFP from a raw byte array, and applies runtime configuration options.

This constructor is intended for cases where the DFP (Dataflow Program) is already loaded in memory and represented as a raw byte buffer. It sets up the accelerator by configuring device usage, scheduling options, and client/server communication in a single step.

Warning

Setting ignore_server_ to true disables coordination with other processes and may result in device conflicts if multiple clients or containers access the same device concurrently. Use with caution.

Warning

The memory pointed to by dfp_bytes must remain valid throughout object construction. Passing a dangling or temporary buffer may lead to undefined behavior.

Parameters:
  • dfp_bytes – Pointer to the raw, in-memory representation of the compiled DFP (uint8_t*). This buffer must remain valid and accessible throughout the initialization phase.

  • device_ids_to_use – List of MXA device IDs to use for execution. Specify {-1} to indicate that all available devices should be used. Default is {0}.

  • use_model_shape – A pair of boolean flags {input, output} indicating whether to preserve the original model input/output shapes (true) or use the shape interpreted by the MXA runtime (false). Default is {true, true}.

  • local_mode – If true, the DFP will be executed in local (non-shared) mode. This may offer better performance in single-process environments but disables multi-process and multi-DFP coordination. Default is false.

  • sched_options – Scheduler options used during shared execution mode. Configures frame limits, timeouts, and queue sizes. Default is {frame_limit = 600, timeout = 0, swap_on_empty = false, input_queue_size = 16, output_queue_size = 12}. See MX::RPC::SchedulerOptions for detailed field descriptions.

  • client_options – Client-side behavior options such as FPS smoothing and throttling. Default is {smoothing = false, fps_target = 0}. See MX::RPC::ClientOptions for detailed field descriptions.

  • server_addr – Address of the manager server. On Linux, this is typically a UNIX socket path (e.g., "/run/mxa_manager/"). On Windows or remote setups, this should be an IP address such as "localhost".

  • server_port_base – Base port number used by the manager server for communication. Applies to both socket filenames and IP-based addressing. Default is 10000.

  • ignore_server_ – (Advanced) If set to true, the object will bypass the manager server and run the DFP in local mode. Default is false.

bool send_input(std::vector<float*> in_data, int model_id, int stream_id, int32_t timeout = 0)#

Sends input data to the accelerator in user-threaded (Manual Threading) mode.

This method submits input data to a specific model and stream instance using user-managed threading. It optionally waits for buffer availability based on the provided timeout.

Parameters:
  • in_data – A vector of pointers to input data buffers. Each float* corresponds to a model input port.

  • model_id – Index of the model the input data is targeted to.

  • stream_id – Index of the stream associated with this input. This must match the stream ID used during stream setup.

  • timeout – Maximum time to wait (in milliseconds) for the input to be accepted. A value of 0 indicates that the call will block indefinitely until space is available. Default is 0.

Returns:

Returns true if the input was accepted and inference started successfully. Returns false if the operation timed out before data could be submitted.

bool receive_output(std::vector<float*> &out_data, int model_id, int stream_id, int32_t timeout = 0)#

Receives output data from the accelerator in user-threaded (manual threading) mode.

This method retrieves output feature maps from the accelerator for a given model and stream. It is intended for use in MxAcclMT mode, where the user manages threading and stream lifecycle manually. The call will block until output is available or the specified timeout is reached.

Parameters:
  • out_data – A vector of pointers where the output data will be written. Each float* must point to a preallocated buffer matching the size of the corresponding model output.

  • model_id – Index of the model from which output data is to be retrieved.

  • stream_id – Index of the stream associated with this output. It can be any user-defined stream ID that the application uses to manage context.

  • timeout – Maximum time to wait (in milliseconds) for output data to become available. A value of 0 indicates that the call will block indefinitely until data is ready. Default is 0.

Returns:

Returns true if output was successfully received. Returns false if the operation timed out before data was available.

bool run(std::vector<float*> in_data, std::vector<float*> &out_data, int model_id, int stream_id, int32_t timeout = 0)#

Runs inference on a batch of inputs in user-threaded (manual threading) mode.

This method sends input data to the accelerator and waits for the corresponding outputs in a single call. It is designed for use in MxAcclMT mode, where the user manages threading and stream assignment manually. The call blocks until inference completes or the specified timeout expires.

Note

This method combines send_input() and receive_output() into one operation yet send and receive to the accelerator is performed asynchronously internally.

Parameters:
  • in_data – A vector of pointers to input data buffers. Each float* corresponds to one input port of the model.

  • out_data – A vector of pointers where the output data will be written. Each float* must point to a preallocated buffer matching the expected size of the corresponding model output.

  • model_id – Index of the model on which inference should be performed.

  • stream_id – Index of the stream used to run inference. This can be any user-defined ID that the application uses to track context.

  • timeout – Maximum wait time (in milliseconds) for inference to complete. A value of 0 indicates that the function blocks indefinitely until results are ready. Default is 0.

Returns:

Returns true if inference was successful and output was received. Returns false if the operation timed out.

class MxAcclBase#

MxAcclBase class is the base class for MxAccl. It provides the basic functionality to connect to a DFP. It is not intended to be used directly, but rather as a base class for MxAccl and MxAcclMT.

Subclassed by MX::Runtime::MxAccl, MX::Runtime::MxAcclMT

Public Functions

int get_num_models()#

Get number of models in the compiled DFP.

Returns:

Number of models

int get_dfp_num_chips()#

Get number of chips the dfp is compiled for.

Returns:

Number of chips

MX::Types::MxModelInfo get_model_info(int model_id = 0) const#

get information of a particular model such as number of in out featureMaps and in out layer names

Parameters:

model_id – model ID or the index for the required information

Returns:

if valid model_id then MxModelInfo model_info with necessary information else throw runtime error invalid model_id

MX::Types::MxModelInfo get_pre_model_info(int model_id = 0) const#

get information of the pre-processing model set to a particular model such as number of in out featureMaps and their sizes and shapes

Parameters:

model_id – model ID or the index for the required information

Returns:

if valid model_id then MxModelInfo model_info with necessary information else throw runtime error invalid model_id

MX::Types::MxModelInfo get_post_model_info(int model_id = 0) const#

get information of the post-processing model set to a particular model such as number of in out featureMaps and their sizes and shapes

Parameters:

model_id – model ID or the index for the required information

Returns:

if valid model_id then MxModelInfo model_info with necessary information else throw runtime error invalid model_id

void connect_post_model(std::filesystem::path post_model_path, int model_id = 0, const std::vector<size_t> &post_size_list = {})#

Connect the information of the post-processing model that has been cropped by the neural compiler.

Parameters:
  • post_model_path – Abosulte path of the post-processing model. (Can be onnx/tflite etc)

  • model_id – The index of model for which the post-processing is intended to be connected to. The default is set to 0

  • post_size_list – If the output of the post-processing has a variable size or if the ouput sizes are not deduced, the maximum possible sizes of the output need to be passed. The default is an empty vector.

void connect_pre_model(std::filesystem::path pre_model_path, int model_id = 0)#

Connect the information of the pre-processing model that has been cropped by the neural compiler.

Parameters:
  • pre_model_path – Abosulte path of the pre-processing model. (Can be onnx/tflite etc)

  • model_id – The index of model for which the post-processing is intended to be connected to. The default is set to 0

void set_parallel_fmap_convert(int num_threads, int model_id = 0)#

Configure multi-threaded FeatureMap data conversion using the given number of threads. Conversion multithreading is mainly intended for high FPS single-stream scenarios, or userThreading mode. In multi-stream autoThreading scenarios, this option should not be necessary, and may even degrade performance due to increased CPU load.

Parameters:
  • num_threads – Number of worker threads for FeatureMaps. Use >= 2 to enable. Values < 2 disable.

  • model_id – Index of model to enable the feature The default is set to 0

bool can_get_power_consumption(int device_id = 0)#

Checks if power consumption data can be retrieved for the connected modules.

Returns:

true if power consumption data is available, false otherwise.

float get_power(int device_id = 0)#

Retrieves the current power consumption of the specified device.

Returns:

The current power consumption value (in milliwatts) of the device.

float get_max_temperature(int device_id = 0)#

Retrieves the current maximum temperature of the specified device.

Returns:

The current maximum temperature value (in degrees Celsius) of the device.

std::vector<float> get_chip_temperatures(int device_id = 0)#

Retrieves temperatures of each chip on the specified device.

Returns:

A reference to a vector<vector> containing the current temperature values (in degrees Celsius) of each chip on the device.

bool set_operating_frequency(int device_id, MX::Types::MxFrequencyOption freq_option = MX::Types::MxFrequencyOption::FREQ_600MHz, bool two_chip_mode_on_four_chip = false)#

Sets the operating frequency of the device.

Note

This function must be called before invoking connect_dfp(). Calling it after connect_dfp() has will throw runtime error.

Parameters:

freq_option – The desired frequency option. Defaults to 600 MHz if not specified.

Returns:

true if the frequency was successfully set, false otherwise.

struct SchedulerOptions#

Configures scheduling behavior for a DFP (Dataflow Program) instance.

The SchedulerOptions struct defines runtime scheduling policies for a DFP when operating in shared mode. These options determine when and how the DFP instance should be swapped out based on input availability, queue capacities, or time constraints. They provide fine-grained control over input/output buffering and processing efficiency.

Public Members

uint32_t frame_limit#

The maximum number of frames that can be enqueued before the associated DFP is swapped out.

uint32_t time_limit#

The maximum time duration (in milliseconds) the DFP is allowed to run before being swapped out.

bool stop_on_empty#

If set to true, the DFP is swapped out immediately when the input queue becomes empty. This is useful for reducing idle resource usage in multi-client scenarios.

uint32_t ifmap_queue_size#

Capacity of the input feature map (ifmap) queue used by the DFP. This queue is shared across all clients of the DFP.

uint32_t ofmap_queue_size#

Capacity of the per-client output feature map (ofmap) queues.

struct ClientOptions#

Configures client-side execution behavior, such as FPS smoothing and pacing.

The ClientOptions struct defines optional runtime behaviors for individual clients, including frame rate smoothing and target pacing. These settings help regulate how frequently input frames are submitted, which can be important for latency-sensitive or performance-constrained applications.

Public Members

bool smoothing#

If true, enables frame rate smoothing to reduce variability in submission timing. This can improve consistency in visual or temporal output.

float fps_target#

Target frames per second for the client. A delay of 1 / fps_target seconds will be enforced between input submissions. A value of 0 disables pacing.

class FeatureMap#

Encapsulates data buffers used by MxAccl for input and output feature maps.

The FeatureMap class represents a data container used internally by MxAccl to manage tensor data across the accelerator interface. It provides methods to safely send (set_data) and retrieve (get_data) data while abstracting low-level memory management.

Public Functions

MX::Utils::MX_status get_data(float *out_data) const#

Retrieves the data from the accelerator’s output buffer into the given destination.

This method copies data from the internal feature map buffer into the user-provided memory pointed to by out_data.

Parameters:

out_data – Pointer to the destination buffer for the output data.

Returns:

MX_Status indicating success or failure.

MX::Utils::MX_status set_data(float *in_data) const#

Sets the input data to be sent to the accelerator.

Copies data from the user-provided in_data pointer into the feature map’s internal buffer.

Parameters:

in_data – Pointer to the source input data.

Returns:

MX_Status indicating success or failure.

struct MxModelInfo#

Holds metadata and configuration details for a compiled model.

The MxModelInfo struct contains essential information about a model compiled for execution, including the number of input and output feature maps, their shapes and sizes, and the associated layer names. This metadata is typically used during runtime setup, validation, or for constructing input/output buffers.

Public Members

int model_index#

Unique index identifying the model instance.

int num_in_featuremaps#

Number of input feature maps required by the model.

int num_out_featuremaps#

Number of output feature maps produced by the model.

std::vector<std::string> input_layer_names#

Names of the model’s input layers, listed in the order expected by the runtime.

std::vector<std::string> output_layer_names#

Names of the model’s output layers, listed in the order produced by the model.

std::vector<MX::Types::ShapeVector> in_featuremap_shapes#

Shapes of the input feature maps, represented as a vector of ShapeVector objects.

std::vector<MX::Types::ShapeVector> out_featuremap_shapes#

Shapes of the output feature maps, represented as a vector of ShapeVector objects.

std::vector<size_t> in_featuremap_sizes#

Sizes (in bytes or elements, depending on context) of each input feature map.

std::vector<size_t> out_featuremap_sizes#

Sizes (in bytes or elements, depending on context) of each output feature map.

class ShapeVector#

Represents the shape of a tensor with flexible dimension ordering.

The ShapeVector class encapsulates tensor shape information using a fixed-size or dynamic vector. It provides convenience methods for interpreting and converting between channel-first and channel-last formats. Internally, the shape is represented using four components by default: height (h), width (w), batch (z), and channel (c), but custom-sized shapes are also supported.

Public Functions

ShapeVector()#

Default constructor. Initializes the shape with 4 dimensions, all set to 0.

ShapeVector(int64_t h, int64_t w, int64_t z, int64_t c)#

Construct a ShapeVector using explicit dimensions.

Parameters:
  • h – Height.

  • w – Width.

  • z – Batch size.

  • c – Channel count.

ShapeVector(int size)#

Construct a ShapeVector of custom size with all dimensions initialized to 1.

Parameters:

size – Number of dimensions.

std::vector<int64_t> chfirst_shape()#

Returns the shape in channel-first format (e.g., [C, H, W, Z]).

Returns:

A vector of dimensions in channel-first order.

std::vector<int64_t> chlast_shape()#

Returns the shape in channel-last format (e.g., [H, W, Z, C]).

Returns:

A vector of dimensions in channel-last order.

int64_t *data()#

Returns a pointer to the raw shape data.

Returns:

Pointer to the first element of the shape vector.

int64_t size() const#

Returns the number of dimensions in the shape.

Returns:

Number of dimensions.