Accelerator API (C++)#

The C++ API has two different modes of operation: auto-threading mode in which the library handles the send and recv threads automatically, and manual-threading mode in which the user is responsible to create and manage threads and call the respective send and recv functions.

Auto-Threading Mode

class MxAccl#

Public Functions

MxAccl(bool use_shared_mode = false, std::string server_ip = "127.0.0.1", unsigned int server_port_base = 10000)#

MxAccl constructor.

Parameters:

use_shared_mode – This flag is ‘false’ by default, giving the MxAccl object direct control of the MXA. When set to ‘true’, MxAccl can be used in Shared mode which enables multiple processes on local (or remote) machines to share the MXA, with some potential performance penalty.
server_ip – Server IP to connect to represented as string. Default IP address is 127.0.0.1, which is localhost.
server_port_base – Starting port number as unsigned int, default is 10000. The server will use this port, and port+1 and port+2. For example, 10000, 10001, 10002.

int connect_dfp(const std::filesystem::path dfp_path, std::vector<int> &device_ids_to_use)#

Connect a dfp to MxAccl object. Currently only one connect_dfp per MxAccl object is allowed.

Parameters:

file_path – Absolute path of DFP file. char* and String types can also be passed.
device_ids_to_use – IDs of MXA devices this process intends to use. takes in a vector of IDs and will return an error if an empty vector is passed

Returns:

dfp_id which is later to be passed in connect_stream function to specify that specific stream to a dfp

int connect_dfp(const std::filesystem::path dfp_path, int group_id = 0)#

Connect a dfp to MxAccl object. Currently only one connect_dfp per MxAccl object is allowed.

Parameters:

file_path – Absolute path of DFP file. char* and String types can also be passed.
group_id – GroupId of MPU this application is intended to use. group_id is defaulted to 0, but needs to be provided if using any other group

Returns:

dfp_id which is later to be passed in connect_stream function to specify that specific stream to a dfp

int connect_dfp(const uint8_t *dfp_bytes, std::vector<int> &device_ids_to_use)#

Connect a dfp as bytes to MxAccl object. Currently only one connect_dfp per MxAccl object is allowed.

Parameters:

dfp_bytes – Raw uint8_t* pointer to DFP data
device_ids_to_use – IDs of MXA devices this process intends to use. takes in a vector of IDs and will return an error if an empty vector is passed

Returns:

dfp_id which is later to be passed in connect_stream function to specify that specific stream to a dfp

int connect_dfp(const uint8_t *dfp_bytes, int group_id = 0)#

Connect a dfp as bytes to MxAccl object. Currently only one connect_dfp per MxAccl object is allowed.

Parameters:

dfp_bytes – Raw uint8_t* pointer to DFP data
group_id – GroupId of MPU this application is intended to use. group_id is defaulted to 0, but needs to be provided if using any other group

Returns:

dfp_id which is later to be passed in connect_stream function to specify that specific stream to a dfp

void start()#: Start running inference. All streams must be connected before calling this function.

void stop()#: Stop running inference. Shouldn’t be called before calling start.

void wait()#: Wait for all the streams to be done streaming. This function waits till all the started input callbacks have returned false. Shouldn’t be called before calling start.

int get_num_models()#

Get number of models in the compiled DFP.

Returns:: Number of models

int get_num_streams()#

Get number the number of streams connected to the object.

Returns:: Number of streams

int get_dfp_num_chips()#

Get number of chips the dfp is compiled for.

Returns:: Number of chips

void connect_stream(float_callback_t in_cb, float_callback_t out_cb, int stream_id, int model_id = 0, int dfp_id = 0)#

Connect a stream to a model.

float_callback_t is a function pointer of type, bool foo(vector<const MX::Types::FeatureMap<float>*>, int).
When this input callback function returns false, the corresponding stream is stopped and when all the streams stop, wait() is executed.
connect_stream should be called before calling start() or after calling stop().

Parameters:

in_cb – -> input callback function used by this stream
out_cb – -> output callback function used by this stream
stream_id – -> Unique id given to this stream which can later be used in the corresponding callback functions
model_id – -> Index of model this stream is intended to be connected
dfp_id – -> id of dfp returned by connect_dfp() function

MX::Types::MxModelInfo get_model_info(int model_id) const#

get information of a particular model such as number of in out featureMaps and in out layer names

Parameters:: model_id – model ID or the index for the required information
Returns:: if valid model_id then MxModelInfo model_info with necessary information else throw runtime error invalid model_id

MX::Types::MxModelInfo get_pre_model_info(int model_id) const#

get information of the pre-processing model set to a particular model such as number of in out featureMaps and their sizes and shapes

Parameters:: model_id – model ID or the index for the required information
Returns:: if valid model_id then MxModelInfo model_info with necessary information else throw runtime error invalid model_id

MX::Types::MxModelInfo get_post_model_info(int model_id) const#

get information of the post-processing model set to a particular model such as number of in out featureMaps and their sizes and shapes s

Parameters:: model_id – model ID or the index for the required information
Returns:: if valid model_id then MxModelInfo model_info with necessary information else throw runtime error invalid model_id

void set_num_workers(int input_num_workers, int output_num_workers, int model_idx = 0)#

Set the number of workers for input and output streams. The default is the number of streams for both number of input and output streams as that provides the maximum performance. If this method is not called before calling start(), the accl will run in default mode. This method should be called after connecting all the required streams.

Parameters:

input_num_workers – Number of input workers
output_num_workers – Number of output workers
model_idx – Index of model to which the workers are intended to be assigned to. The default is set to 0

void connect_post_model(std::filesystem::path post_model_path, int model_idx = 0, const std::vector<size_t> &post_size_list = {})#

Connect the information of the post-processing model that has been cropped by the neural compiler.

Parameters:

post_model_path – Abosulte path of the post-processing model. (Can be onnx/tflite etc)
model_idx – The index of model for which the post-processing is intended to be connected to.
post_size_list – If the output of the post-processing has a variable size or if the ouput sizes are not deduced, the maximum possible sizes of the output need to be passed. The default is an empty vector.

void connect_pre_model(std::filesystem::path pre_model_path, int model_idx = 0)#

Connect the information of the pre-processing model that has been cropped by the neural compiler.

Parameters:

pre_model_path – Abosulte path of the pre-processing model. (Can be onnx/tflite etc)
model_idx – The index of model for which the post-processing is intended to be connected to.

void set_parallel_fmap_convert(int num_threads, int model_idx = 0)#

Configure multi-threaded FeatureMap data conversion using the given number of threads. Conversion multithreading is mainly intended for high FPS single-stream scenarios, or userThreading mode. In multi-stream autoThreading scenarios, this option should not be necessary, and may even degrade performance due to increased CPU load.

Parameters:

num_threads – Number of worker threads for FeatureMaps. Use >= 2 to enable. Values < 2 disable.
model_idx – Index of model to enable the feature The default is set to 0

Manual Threading Mode

class MxAcclMT#

Public Functions

MxAcclMT(bool use_shared_mode = false, std::string server_ip = "127.0.0.1", unsigned int server_port_base = 10000)#

MxAcclMT constructor.

Parameters:

use_shared_mode – This flag is ‘false’ by default, giving the MxAcclMT object direct control of the MXA. When set to ‘true’, MxAccl can be used in Shared mode which enables multiple processes on local (or remote) machines to share the MXA, with some potential performance penalty.
server_ip – Server IP to connect to represented as string. Default IP address is 127.0.0.1, which is localhost.
server_port_base – Starting port number as unsigned int, default is 10000. The server will use this port, and port+1 and port+2. For example, 10000, 10001, 10002.

int connect_dfp(const std::filesystem::path dfp_path, std::vector<int> &device_ids_to_use)#

Connect a dfp to MxAccl object. Currently only one connect_dfp per MxAccl object is allowed.

Parameters:

file_path – Absolute path of DFP file. char* and String types can also be passed.
device_ids_to_use – IDs of MXA devices this process intends to use. takes in a vector of IDs and will return an error if an empty vector is passed

Returns:

dfp_id which is later to be passed in connect_stream function to specify that specific stream to a dfp

int connect_dfp(const std::filesystem::path dfp_path, int group_id = 0)#

Construct a new MxAccl object. Currently only one connect_dfp per MxAccl object is allowed.

Parameters:

file_path – Absolute path of DFP file. char* and String types can also be passed.
group_id – GroupId of MPU this application is intended to use. group_id is defaulted to 0, but needs to be provided if using any other group

Returns:

dfp_id which is later to be passed in connect_stream function to specify that specific stream to a dfp

int connect_dfp(const uint8_t *dfp_bytes, std::vector<int> &device_ids_to_use)#

Connect a dfp as bytes to MxAccl object. Currently only one connect_dfp per MxAccl object is allowed.

Parameters:

dfp_bytes – Raw uint8_t* pointer to DFP data
device_ids_to_use – IDs of MXA devices this process intends to use. takes in a vector of IDs and will return an error if an empty vector is passed

Returns:

dfp_id which is later to be passed in connect_stream function to specify that specific stream to a dfp

int connect_dfp(const uint8_t *dfp_bytes, int group_id = 0)#

Connect a dfp as bytes to MxAccl object. Currently only one connect_dfp per MxAccl object is allowed.

Parameters:

dfp_bytes – Raw uint8_t* pointer to DFP data
group_id – GroupId of MPU this application is intended to use. group_id is defaulted to 0, but needs to be provided if using any other group

Returns:

dfp_id which is later to be passed in connect_stream function to specify that specific stream to a dfp

int get_num_models()#

Get number of models in the compiled DFP.

Returns:: Number of models

int get_dfp_num_chips()#

Get number of chips the dfp is compiled for.

Returns:: Number of chips

MX::Types::MxModelInfo get_model_info(int model_id) const#

get information of a particular model such as number of in out featureMaps and in out layer names

Parameters:: model_id – model ID or the index for the required information
Returns:: if valid model_id then MxModelInfo model_info with necessary information else throw runtime error invalid model_id

void connect_post_model(std::filesystem::path post_model_path, int model_idx = 0, const std::vector<size_t> &post_size_list = {})#

Connect the information of the post-processing model that has been cropped by the neural compiler.

Parameters:

post_model_path – Abosulte path of the post-processing model. (Can be onnx/tflite etc)
model_idx – The index of model for which the post-processing is intended to be connected to.
post_size_list – If the output of the post-processing has a variable size or if the ouput sizes are not deduced, the maximum possible sizes of the output need to be passed. The default is an empty vector.

void connect_pre_model(std::filesystem::path pre_model_path, int model_idx = 0)#

Connect the information of the pre-processing model that has been cropped by the neural compiler.

Parameters:

pre_model_path – Abosulte path of the pre-processing model. (Can be onnx/tflite etc)
model_idx – The index of model for which the post-processing is intended to be connected to.

MX::Types::MxModelInfo get_pre_model_info(int model_id) const#

get information of the pre-processing model set to a particular model such as number of in out featureMaps and their sizes and shapes

Parameters:: model_id – model ID or the index for the required information
Returns:: if valid model_id then MxModelInfo model_info with necessary information else throw runtime error invalid model_id

MX::Types::MxModelInfo get_post_model_info(int model_id) const#

get information of the post-processing model set to a particular model such as number of in out featureMaps and their sizes and shapes s

Parameters:: model_id – model ID or the index for the required information
Returns:: if valid model_id then MxModelInfo model_info with necessary information else throw runtime error invalid model_id

bool send_input(std::vector<float*> in_data, int model_id, int stream_id, int dfp_id = 0, bool channel_first = false, int32_t timeout = 0)#

Send input to the accelerator in userThreading mode.

Parameters:

in_data – -> vector of input data to the model
model_id – -> Index of the model the data is targetted to.
stream_id – -> Index of stream the input data belongs to.
dfp_id – -> id of dfp returned by connect_dfp() function
channel_first – -> boolean variable that indicates the copied data is in channel first or channle last format. default is false expecting data in channel last format
timeout – -> Wait time in milliseconds for the function to be succesful. Default is 0 which indicates that the function never timesout.

Returns:

Returns true if the inference is succesful and false if a timeout happens.

bool receive_output(std::vector<float*> &out_data, int model_id, int stream_id, int dfp_id = 0, bool channel_first = false, int32_t timeout = 0)#

Receive output from the accelerator in userThreading mode.

Parameters:

out_data – -> vector of output data from the model
model_id – -> Index of the model the data is intended to come from.
stream_id – -> Index of stream the output data belongs to.
dfp_id – -> id of dfp returned by connect_dfp() function
channel_first – -> boolean variable that indicates the copied data is in channel first or channle last format. default is false expecting data in channel last format
timeout – -> Wait time in milliseconds for the function to be succesful. Default is 0 which indicates that the function never timesout.

Returns:

Returns true if the inference is succesful and false if a timeout happens.

bool run(std::vector<float*> in_data, std::vector<float*> &out_data, int pmodel_id, int pstream_id, int dfp_id = 0, bool in_channel_first = false, bool out_channel_first = false, int32_t timeout = 0)#

Run inference on the accelerator in userThreading mode.

Parameters:

in_data – -> vector of input data to the model
out_data – -> vector of output data from the model
model_id – -> Index of the model the data is intended to come from.
stream_id – -> Index of stream the output data belongs to.
dfp_id – -> id of dfp returned by connect_dfp() function
in_channel_first – -> boolean variable that indicates the copied input data is in channel first or channle last format. default is false expecting data in channel last format
in_channel_first – -> boolean variable that indicates the copied output data is in channel first or channle last format. default is false expecting data in channel last format
timeout – -> Wait time in milliseconds for the function to be succesful. Default is 0 which indicates that the function never timesout.

Returns:

Returns true if the inference is succesful and false if a timeout happens.

void set_parallel_fmap_convert(int num_threads, int model_idx = 0)#

Configure multi-threaded FeatureMap data conversion using the given number of threads. Conversion multithreading is mainly intended for high FPS single-stream scenarios, or userThreading mode. In multi-stream autoThreading scenarios, this option should not be necessary, and may even degrade performance due to increased CPU load.

Parameters:

num_threads – Number of worker threads for FeatureMaps. Use >= 2 to enable. Values < 2 disable.
model_idx – Index of model to enable the feature The default is set to 0

template<typename T> class FeatureMap#

The FeatureMap class.

FeatureMaps are entities that are internally used by MxAccl to hold and manipulate data. It provides required methods such as set_data() and get_data() for the users to safely send and receive data from MxAccl.

Public Functions

MX::Utils::MX_status get_data(T *out_data, bool channel_first = false) const#

Function get output from Accelarator. Copies output data from featureMap to passed pointer.

Parameters:

out_data – pointer to destination where output data from accelrator to be copied
channel_first – boolean variable based on which output data is copied in channel first or channel last format. default is false to return channel last format

Returns:

MX_Status Success if the copy is successfull

MX::Utils::MX_status get_data_no_copy(T *&out_data, bool channel_first = false) const#

Function get output from Accelarator. Sets the internal pointer to the featureMap to the passed out_data pointer. Does not copy data from featureMap to passed pointer. The out_data pointer can never be deleted by the user. It is owned by the featureMap. The validity of the out_data pointer is until the callback is returned. This function is intended for embedded systems with slow dram access. So unless necessary, get_data() should be used.

Parameters:

out_data – pointer to destination where output data from accelrator to be copied
channel_first – boolean variable based on which output data is copied in channel first or channel last format. default is false to return channel last format

Returns:

MX_Status Success if the copy is successfull

MX::Utils::MX_status set_data(T *in_data, bool channel_first = false) const#

Function to set input data to Accelarator. Copies data from provided input pointer to featureMap.

Parameters:

in_data – pointer to source from where input data is to be copied from
channel_first – boolean variable that indicates the copied data is in channel first or channle last format. default is false expecting data in channel last format

Returns:

MX_Status Success if the copy is successfull

struct MxModelInfo#

struct with necessary information of a model

Public Members

int model_index#: index of a model to identify

int num_in_featuremaps#: Number of input featuremaps

int num_out_featuremaps#: Number of output featuremaps

std::vector<std::string> input_layer_names#: Vector of strings containing input layer names

std::vector<std::string> output_layer_names#: Vector of strings containing output layer names

std::vector<MX::Types::ShapeVector> in_featuremap_shapes#: Vector of Shapevector containing input featuremap shapes

std::vector<MX::Types::ShapeVector> out_featuremap_shapes#: Vector of Shapevector containing output featuremap shapes

std::vector<size_t> in_featuremap_sizes#: Vector of size_t containing sizes fo the input featuremaps

std::vector<size_t> out_featuremap_sizes#: Vector of size_t containing sizes fo the output featuremaps

class ShapeVector#

Public Functions

ShapeVector()#: Construct a new ShapeVector type object.

ShapeVector(int64_t h, int64_t w, int64_t z, int64_t c)#

Construct a new ShapeVector type object.

Parameters:

h – Height
w – Width
z – Batch
c – Channel

ShapeVector(int size)#: Construct a new ShapeVector type object.

std::vector<int64_t> chfirst_shape()#: returns a vector of shape with channel first format

std::vector<int64_t> chlast_shape()#: returns a vector of shape with channel last format

int64_t *data()#: returns a data pointer of the sape vector

int64_t size() const#: returns size of the shape vector

Notes#

In auto-threading mode, all the callback functions run in seperate threads. So make sure that these functions are thread safe.
In user-threading mode send_input and receive_output can be called in seperate threads and make sure to use stream_id to keep track of the data in case of multiple streams.
Prefer using auto-threading whenever possible as it is optimized for the accelerator. User threading can also be used in the cases where auto-threading is too complicated or cannot be used.
The default number of input and output workers will be equal to the number of streams connected as this ensures maximum performance.
There might be situations where the number of workers need to be reduced based on the CPU capabilities or application performing good even with less workers. So it is advised to always optimize the workers based on the application and host system.