CenterNet Object Detection#

Introduction#

In this tutorial, we will demonstrate how to use the Acclerator C++ API to perform object detection with CenterNet on the MX3. We will use the centernet_mobilenetv2_fpn_kpts model for our demo. The goal of this tutorial is to demonstrate the end-to-end inference capability of the API in C++, including how to connect any pre-processing and/or post-processing that may have been cropped.

Background#

Some models, like the centernet model in this tutorial, have layers at the beginning and end that are not supported natively on MX3 hardware. The neural compiler’s model cropping functionality handles this. More details are available in the Model Cropping tutorial.

Both the Python and C++ APIs support connecting pre and post models into the accelerator runtime object so that you don’t have to create and manage additional runtimes.

Note that not all models have both pre and post models; for example, the YoloV7 model in the Object Detection tutorial only has a post model.

Note

This tutorial assumes a four-chip solution is correctly connected.

C++ users will have to install the tflite library from source. Refer thirdparty libraries for installation steps.

1. Download the Model#

The CenterNet pre-trained models are available on the TensorFlow Centernet GitHub page. For convenience, we have provided the exported and compiled models in the following compressed folder attached to this tutorial.

2. Compile the Model#

CenterNet needs to be compiled with the autocrop flag/argument which generates a DFP file for the main section of the model (centernet_onnx.dfp), the pre-processing model (centernet_pre.onnx) and the post-processing model (centernet_post.onnx). The compilation step is typically needed once and can be done using the Neural Compiler API or Tool.

API

from memryx import NeuralCompiler
nc = NeuralCompiler(num_chips=4, models="centernet.onnx", verbose=1, dfp_fname = "centernet_onnx", autocrop=True)
dfp = nc.run()

CLI Tool

In your command line, you need to type,

mx_nc -v -m centernet.onnx --autocrop -c 4 --dfp_fname centernet_onnx

C++ Pre-Compiled DFP

In your C++ code, you need to point the dfp via a generated file path,

fs::path onnx_model_path = "centernet_onnx.dfp";

Note

The code above uses the onnx version of the model for compilation, but you can also pass in the tflite or tensorflow versions.

3. Pipelines#

In this tutorial OpenCV is used for image loading, image procesing and display. The following flowchart shows the different parts of the pipeline. Note that the input camera frame should be saved (queued) to be later overlayed and displayed.

graph LR input([Input Function]) --> model_pre([Pre-Processing model]) model_pre --> accl([Accelerator]) accl --> model_post([Post-Processing model]) model_post --> output([Output Function]) input.->q[[Frames Queue]] q .-> output style input fill:#CFE8FD, stroke:#595959 style model_pre fill:#CFE8FD, stroke:#595959 style accl fill:#FFE699, stroke:#595959 style model_post fill:#A9D18E, stroke:#595959 style output fill:#A9D18E, stroke:#595959 style q fill:#dbd9d3, stroke:#595959

4. CV Initializations#

First, we import the required libraries, initialize the CV pipeline, and define common variables.

#include "memx/accl/MxAccl.h"
#include <signal.h>
#include <iostream>
#include <opencv2/opencv.hpp>    /* imshow */
#include <opencv2/imgproc.hpp>   /* cvtcolor */
#include <opencv2/imgcodecs.hpp> /* imwrite */
#include <chrono>
#include <memx/mxutils/gui_view.h>

if(video_src.substr(0,3) == "cam"){
    src_is_cam = true;
    #ifdef __linux__
        if (!openCamera(vcap, video_src[4]-'0', cv::CAP_V4L2)) {
            throw(std::runtime_error("Failed to open: "+video_src));
        }

    #elif defined(_WIN32)
        if (!openCamera(vcap, video_src[4]-'0', cv::CAP_ANY)) {
            throw(std::runtime_error("Failed to open: "+video_src));
        }
    #endif
}
else if(video_src.substr(0,3) == "vid"){
    vcap.open(video_src.substr(4),cv::CAP_ANY);
    src_is_cam = false;
}

5. Define an Input Function#

We need to define an input function for the accelerator which will get a new frame from the cam and pre-process it.

Note

This is the not the same as cropped pre processing discussed before. This section refers to the pre preocessing needs to be done on the image. In this example pre-preocessing referes to image loading, resizing and norimalization.

bool incallback_getframe(vector<const MX::Types::FeatureMap*> dst, int streamLabel){

    if(runflag.load()){
        cv::Mat inframe;
        cv::Mat rgbImage;
        bool got_frame = vcap.read(inframe);

        if (!got_frame) {
            std::cout << "No frame \n\n\n";
            return false;  // return false if frame retrieval fails
        }
        {
            std::lock_guard<std::mutex> ilock(frame_queue_mutex);
            cv::cvtColor(inframe, rgbImage, cv::COLOR_BGR2RGB);
            frames_queue.push_back(rgbImage);
        }
        // Preprocess frame
        cv::Mat preProcframe = preprocess(rgbImage);

        if(type_ == App_Onnx){
            // For ONNX models, we need to convert the image to CHW format
            cv::Mat chwImage;
            cv::dnn::blobFromImage(preProcframe, chwImage, 1.0, cv::Size(model_input_width, model_input_height), cv::Scalar(0, 0, 0), true, false);
            preProcframe = chwImage;
        }

        dst[0]->set_data((float*)preProcframe.data);

        return true;
    }           
    else{
        vcap.release();
        return false;
    }    
}

Note

In the above code, method preprocess is used as pre-processing step. This method can be found as a part of the full code file.

6. Define Output Functions#

We also need to define an out function for the accelerator to use. Our output function will post-process the accelerator output and display it on the screen.

Note

This is the not the same as cropped post processing discussed before. This section refers to the post preocessing needs to be done on the image. In this example post-preocessing referes to decoding output, drawing boxes and displaying the image.

The output function will also overlay and display the output frame besides the MXA data collection and post-processing.

bool outcallback_getmxaoutput(vector<const MX::Types::FeatureMap*> src, int streamLabel){

    for(int i =0; i<src.size();++i){
        src[i]->get_data(output[i]);
    }
    {
        std::lock_guard<std::mutex> ilock(frame_queue_mutex);
        // pop from frame queue
        displayImage = frames_queue.front();
        frames_queue.pop_front();
    }// releases in frame queue lock

    //Get the detections from model output
    num_boxes = output[outmap_.num_boxes_idx][0];
    //printf("num_boxes: %d\n", num_boxes);
    std::vector<detectedObj> detected_objectVector = get_detections(output);
    
    // draw boundign boxes
    draw_bounding_box(displayImage, detected_objectVector );

    // using mx QT util to update the display frame
    gui_->screens[0]->SetDisplayFrame(streamLabel,&displayImage,fps_number);

    //Calulate FPS once every AVG_FPS_CALC_FRAME_COUNT frames     
    frame_count++;
    if (frame_count == 1)
    {
        start_ms = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now().time_since_epoch());
    }
    else if (frame_count % AVG_FPS_CALC_FRAME_COUNT == 0)
    {
        std::chrono::milliseconds duration =
            std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now().time_since_epoch()) - start_ms;
        fps_number = (float)AVG_FPS_CALC_FRAME_COUNT * 1000 / (float)(duration.count());
        frame_count = 0;
    }
    return true;

}

7. Connect the Accelerator#

The main() function Creates the accelerator, CenterNet object and starts the acceleartor and waits for it to finish

  //Create the Accl object and load the DFP
  accl = new MX::Runtime::MxAccl(onnx_model_path.c_str(), {0}, {true,true}, false, {20, 0, false, 12, 12}, {false, 0});
  //Connecting the pre-processing and post-processing models
  accl->connect_pre_model(onnx_preprocessing_model_path,0);
  accl->connect_post_model(onnx_postprocessing_model_path,0);

//Creating a CenterNet object for each stream which also connects the corresponding stream to accl.
CenterNet* obj;
if(plugin_name=="onnx"){
    obj = new CenterNet(accl,video_src,&gui,App_Onnx);
}
else if (plugin_name=="tf"){
    obj = new CenterNet(accl,video_src,&gui,App_Tf);
}
else{
    obj = new CenterNet(accl,video_src,&gui,App_Tflite);
}
//Run the accelerator and wait
accl->start();
gui.Run();  //This command waits for exit to be pressed in Qt window
accl->stop();

The CenterNet() constructor connects the input stream to the accelartor.

auto in_cb = std::bind(&CenterNet::incallback_getframe, this, std::placeholders::_1, std::placeholders::_2);
auto out_cb = std::bind(&CenterNet::outcallback_getmxaoutput, this, std::placeholders::_1, std::placeholders::_2);
accl->connect_stream(in_cb, out_cb, 0, 0);

8. How to Use#

Users can download the attached zip file and compile the application with cmake. This will result in an executable, CenterNet. The following commands are supposed to be run in a terminal in the same directory as the executable.

Default run, starts the application with onnx models and uses a pre-stored video file,

./CenterNet

Users can specify their desired model library to start the application with that library to run th application on a pre-stored video file,

./CenterNet tflite
./CenterNet tf

Users can specify their desired model library and desired input to the application,

./CenterNet onnx vid:<path to video file>
./CenterNet tflite cam:<camera index>

9. Summary#

This tutorial showed how to use the Accelerator C++ API to run a inference using an centernet model. The code and the resources used in the tutorial are available to download:

centernet.zip