MxPrepost Usage#

MxPrepost is a high-performance pre/post-processing library for YOLO models on MemryX systems. It is designed to work alongside the MemryX Runtime callbacks and provides optimized implementations for common operations such as preprocessing/formatting, output decoding, NMS, and drawing results on frames.

The library supports common YOLO tasks (detection, segmentation, and pose), and is intended to simplify deployment while maintaining high throughput.

Note

MxPrepost is designed to work hand-in-hand with MxAccl callback-based runtime flows.
It is an optimized replacement for the cropped post-model code (for example, *_post.onnx generated with --autocrop) by running an optimized version for CPUs directly inside callback code.

For background and alternatives, see:

See also

For a runnable app using MxPrepost, see this YOLO11 tutorial.

Install MxPrepost#

Install and Verify

Important

MxPrepost requires MemryX Runtime / SDK 2.2+.
If MemryX tools are not installed yet, follow Install Tools first.

Install MxPrepost in the same Python environment as the MemryX SDK.

1. Verify SDK version

mx_nc --version
Optional: Upgrade SDK tools (only if SDK is below 2.2)
pip3 install --extra-index-url https://developer.memryx.com/pip --upgrade memryx

2. Install MxPrepost

pip3 install --extra-index-url https://developer.memryx.com/pip mxprepost

3. Confirm installation

python3 -m pip show mxprepost

Typical Integration Pattern#

The recommended flow is:

  1. Create a MxAccl runtime object.

  2. Create one mxprepost.MxPrepost object (per model/task).

  3. In input callback, call preprocess(frame) and return it to runtime.

  4. In output callback, call postprocess(ofmaps, ori_h, ori_w) or postprocess(ofmaps, ori_frame).

  5. Optionally call draw(frame, result) to overlay output annotations.

import cv2
import mxprepost
from memryx import mxapi

accl = mxapi.MxAccl("model.dfp", use_model_shape=[False, False])
prepost = mxprepost.MxPrepost(accl=accl, task="yolov11-det")

def input_callback(stream_id):
    ok, frame = cap.read()
    if not ok:
        return None
    return prepost.preprocess(frame)

def output_callback(ofmaps, stream_id):
    result = prepost.postprocess(ofmaps, ori_h, ori_w)
    # result.boxes / result.masks / result.keypoints
    return True

accl.connect_stream(input_callback, output_callback, stream_id=0)

The recommended flow is:

  1. Create a MxAccl runtime object.

  2. Create one MX::Prepost::MxPrepost object (per model/task) using create or create_safe.

  3. In input callback, call preprocess(frame) and feed the returned tensor to runtime input.

  4. In output callback, gather output feature maps as float* pointers and call postprocess(ofmaps, result, ori_w, ori_h) (or postprocess(ofmaps, result, original_frame)).

  5. Optionally call draw(frame, result) to overlay output annotations.

#include <memx/accl/MxAccl.h>
#include <memx/prepost/MxPrepost.h>
#include <opencv2/opencv.hpp>
#include <memory>
#include <stdexcept>
#include <string>
#include <vector>

using namespace MX::Runtime;
using namespace MX::Prepost;

// device_ids_to_use is positional here; keep default {0}.
// use_model_shape is the key setting to control input/output shape behavior.
MxAccl accl{"model.dfp", {0}, {false, false}};

YoloUserConfig cfg;  // COCO defaults
cfg.conf = 0.3f;
cfg.iou = 0.4f;

std::unique_ptr<MxPrepost> pp;
std::string err;
if (!MxPrepost::create_safe(&accl, "yolov11-det", cfg, pp, err)) {
    throw std::runtime_error("Failed to create MxPrepost: " + err);
}

int ori_w = static_cast<int>(cap.get(cv::CAP_PROP_FRAME_WIDTH));
int ori_h = static_cast<int>(cap.get(cv::CAP_PROP_FRAME_HEIGHT));

auto in_cb = [&](std::vector<const MX::Types::FeatureMap*>& input, int stream_id) -> bool {
    cv::Mat frame;
    if (!cap.read(frame)) {
        return false;
    }

    cv::Mat preprocessed = pp->preprocess(frame);
    input[0]->set_data(reinterpret_cast<float*>(preprocessed.data));
    return true;
};

auto out_cb = [&](std::vector<const MX::Types::FeatureMap*>& output, int stream_id) -> bool {
    // Copy runtime outputs into your own float buffers, then fill:
    std::vector<float*> ofmap_ptrs;
    // ... populate ofmap_ptrs from output[i]->get_data(...)

    Result result;
    pp->postprocess(ofmap_ptrs, result, ori_w, ori_h);
    // or: pp->postprocess(ofmap_ptrs, result, original_frame);

    // Optional visualization:
    // cv::Mat annotated = original_frame.clone();
    // pp->draw(annotated, result);
    return true;
};

accl.connect_stream(in_cb, out_cb, 0);
accl.start();
accl.wait();

For accessing different results depending on the task:

# Access outputs (depending on task)
# ---------------------------
# Detection (Bounding Boxes)
# ---------------------------
if result.boxes is not None:
    for box in result.boxes:
        print("xyxy:", box.xyxy)        # [x1, y1, x2, y2]
        print("xywh:", box.xywh)        # [x_center, y_center, w, h]
        print("conf:", box.conf)        # confidence score
        print("cls_id:", box.cls_id)    # class index
        print("cls_name:", box.cls_name)# class label string
        print("----")

# ---------------------------
# Segmentation (Masks)
# ---------------------------
if result.masks is not None:
    for mask in result.masks:
        # mask is a polygon (list of Point2f), not a binary HxW array
        pts = [(p.x, p.y) for p in mask.xys]
        print("mask cls_id:", int(mask.cls_id))
        print("num polygon points:", len(pts))
        if pts:
            print("first 5 points:", pts[:5])
        print("----")

# ---------------------------
# Pose Estimation (Keypoints)
# ---------------------------
if result.keypoints is not None:
    for det_id, kps in enumerate(result.keypoints):
        print(f"detection {det_id} keypoints:", len(kps))
        # each kp has kp.xy (Point2f) and kp.conf
        for kp_id, kp in enumerate(kps):
            print(f"  kp[{kp_id}] = (x={kp.xy.x:.2f}, y={kp.xy.y:.2f}, conf={kp.conf:.3f})")
        print("----")
// Detection (Bounding Boxes)
if (!result.boxes.empty()) {
    for (const auto& box : result.boxes) {
        // box.xyxy, box.xywh, box.conf, box.cls_id, box.cls_name
    }
}

// Segmentation (Masks as polygons)
if (!result.masks.empty()) {
    for (const auto& mask : result.masks) {
        // mask.xys (vector<Point2f>), mask.cls_id, mask.cls_name
    }
}

// Pose Estimation (Keypoints)
if (!result.keypoints.empty()) {
    for (const auto& kps : result.keypoints) {
        for (const auto& kp : kps) {
            // kp.xy.x, kp.xy.y, kp.conf
        }
    }
}

You can also check:

YOLO11 End-to-End Tutorial

Full example showing MxAccl + MxPrepost in a real-time object detection pipeline.

../../tutorials/realtime_inf/realtime_od_yolo11.html

Configuration Reference#

You can explore MxPrepost API
Common constructor arguments:

Argument

Type

Description

accl (required)

mxapi.MxAccl

mxapi.MxAccl runtime object used by MxPrepost for shape/task context.

task (required)

str

YOLO task string, e.g. yolov8-det, yolov11-seg, yolov11-pose.

conf

float

Confidence threshold (default: 0.3).

iou

float

IoU threshold for NMS (default: 0.4).

classmap_path

str

Path to class labels text file (one label per line).

custom_class_labels

list[str]

Override class labels directly in code.

valid_classes

list[int]

Keep only selected class IDs.

class_agnostic

bool

Enable class-agnostic NMS when True.

fast_sigmoid

bool

Use approximate sigmoid for slightly faster processing.

model_id

int

Select model when multiple models are compiled into one DFP.

override_layer_mapping

dict[int, list[str]]

Advanced override for output layer-to-port mapping.

For Custom YOLO Models#

Use these arguments to align MxPrepost with your custom model and dataset:

  • classmap_path: load class labels from a text file (one label per line).

  • custom_class_labels: provide labels directly in code.

  • valid_classes: return only selected class IDs.

  • model_id: choose the correct model when a DFP contains multiple models.

Note

If both classmap_path and custom_class_labels are provided, custom_class_labels takes precedence.

For custom models, ensure task (det / seg / pose) and class count are consistent with model outputs. If those are correct, auto-mapping usually works without override_layer_mapping.

prepost = mxprepost.MxPrepost(
    accl=accl,
    task="yolov11-det",
    custom_class_labels=["person", "helmet", "vest"],
    valid_classes=[0, 1],
    model_id=0,
)

Troubleshooting#

Auto Layer Detection Fails#

Use this only when auto layer detection fails and your task, class count, and model_id are already correct.

  1. Inspect model output layer names and shapes:

dfp_inspect path_dfp_file

or using C++ code:

#include <iostream>
#include <memx/accl/MxAccl.h>

MX::Runtime::MxAccl accl{"model.dfp", {0}, {false, false}};
auto info = accl.get_model_info(0);  // use your model_id

for (size_t i = 0; i < info.output_layer_names.size(); ++i) {
    std::cout << i << " | "
              << info.output_layer_names[i] << " | "
              << info.out_featuremap_shapes[i].to_string() << "\n";
}
  1. Build override_layer_mapping with those exact layer names:

  • Detection: each stride maps to [coord_layer_name, conf_layer_name].

  • Segmentation: each stride maps to [coord_layer_name, conf_layer_name, mask_coef_layer_name] plus 0: [mask_proto_layer_name].

  • Pose: each stride maps to [coord_layer_name, conf_layer_name, keypoint_layer_name].

  1. Include all required stride entries (commonly 8, 16, 32):

# Example: explicit mapping for YOLO detection
prepost = mxprepost.MxPrepost(
    accl=accl,
    task="yolov11-det",
    override_layer_mapping={
        8:  ["/model.22/cv2.0/cv2.0.2/Conv_output_0", "/model.22/cv3.0/cv3.0.2/Conv_output_0"],
        16: ["/model.22/cv2.1/cv2.1.2/Conv_output_0", "/model.22/cv3.1/cv3.1.2/Conv_output_0"],
        32: ["/model.22/cv2.2/cv2.2.2/Conv_output_0", "/model.22/cv3.2/cv3.2.2/Conv_output_0"],
    },
)

# Segmentation additionally needs global mask proto:
# override_layer_mapping = {
#     8:  ["coord_s8",  "conf_s8",  "maskcoef_s8"],
#     16: ["coord_s16", "conf_s16", "maskcoef_s16"],
#     32: ["coord_s32", "conf_s32", "maskcoef_s32"],
#     0:  ["mask_proto_global"],
# }

Supported Tasks#

YOLOv7

Detection

YOLOv8

Detection, Segmentation, Pose

YOLOv9

Detection

YOLOv10

Detection

YOLOv11

Detection, Segmentation, Pose

Custom Support

Custom class labels and custom resolutions

Compatibility Notes#

Warning

MxPrepost is designed for MxAccl runtime flows. Legacy Python bindings such as SyncAccl, AsyncAccl, and MultistreamAsyncAccl are not supported.

See also