CenterNet Object Detection#
Introduction#
In this tutorial, we will demonstrate how to use the Acclerator C++ API to perform object detection with CenterNet on the MX3. We will use the centernet_mobilenetv2_fpn_kpts
model for our demo. The goal of this tutorial is to demonstrate the end-to-end inference capability of the API in C++, including how to connect any pre-processing and/or post-processing that may have been cropped.
Background#
Some models, like the centernet model in this tutorial, have layers at the beginning and end that are not supported natively on MX3 hardware. The neural compiler’s model cropping functionality handles this. More details are available in the Model Cropping tutorial.
Both the Python and C++ APIs support connecting pre and post models into the accelerator runtime object so that you don’t have to create and manage additional runtimes.
Note that not all models have both pre and post models; for example, the YoloV7 model in the Object Detection tutorial only has a post model.
Note
This tutorial assumes a four-chip solution is correctly connected.
Requirements#
memx-drivers
memx-accl
memx-accl-plugins
memx-utils-gui
C++ users will have to install the tflite library from source. Refer thirdparty libraries for installation steps.
Download the Model#
The CenterNet pre-trained models are available on the TensorFlow Centernet GitHub page. For convenience, we have provided the exported and compiled models in the following compressed folder attached
to this tutorial.
Compile the Model#
CenterNet needs to be compiled with the autocrop
flag/argument which generates a DFP file for the main section of the model (centernet_onnx.dfp
), the pre-processing model (model_0_centernet_pre.onnx
) and the post-processing model (model_0_centernet_post.onnx
). The compilation step is typically needed once and can be done using the Neural Compiler API or Tool.
Hint
You can use the pre-compiled DFP and post-processing section attached
to this tutorial and skip the compilation step.
from memryx import NeuralCompiler
nc = NeuralCompiler(num_chips=4, models="centernet.onnx", verbose=1, dfp_fname = "centernet_onnx", autocrop=True)
dfp = nc.run()
In your command line, you need to type,
mx_nc -v -m centernet.onnx --autocrop -c 4
This will produce a DFP file ready to be used by the accelerator. In your Python code, you need to point the dfp
variable to the generated file path,
dfp = "centernet_onnx.dfp"
In your Python code, you need to point the dfp
variable to the generated file path,
dfp = "centernet_onnx.dfp"
In your C++ code, you need to point the dfp
via a generated file path,
fs::path onnx_model_path = "centernet_onnx.dfp";
Note
The code above uses the onnx version of the model for compilation, but you can also pass in the tflite or tensorflow versions.
Pipelines#
In this tutorial OpenCV is used for image loading, image procesing and display. The following flowchart shows the different parts of the pipeline. Note that the input camera frame should be saved (queued) to be later overlayed and displayed.
CV Initializations#
First, we import the required libraries, initialize the CV pipeline, and define common variables.
#include <opencv2/opencv.hpp> /* imshow */
#include <opencv2/imgproc.hpp> /* cvtcolor */
#include <opencv2/imgcodecs.hpp> /* imwrite */
fs::path default_videoPath = "../Friends.mp4";
// If the input is a camera, try to use optimal settings
if(video_src.substr(0,3) == "cam"){
#ifdef __linux__
if (!openCamera(vcap, video_src[4]-'0', cv::CAP_V4L2)) {
throw(std::runtime_error("Failed to open: "+video_src));
}
#elif defined(_WIN32)
if (!openCamera(vcap, video_src[4]-'0', cv::CAP_ANY)) {
throw(std::runtime_error("Failed to open: "+video_src));
}
#endif
}
else if(video_src.substr(0,3) == "vid"){
vcap.open(video_src.substr(4),cv::CAP_ANY);
}
Define an Input Function#
We need to define an input function for the accelerator which will get a new frame from the cam and pre-process it.
Note
This is the not the same as cropped pre processing discussed before. This section refers to the pre preocessing needs to be done on the image. In this example pre-preocessing referes to image loading, resizing and norimalization.
bool incallback_getframe(vector<const MX::Types::FeatureMap<float>*> dst, int streamLabel){
if(runflag.load()){
cv::Mat inframe;
cv::Mat rgbImage;
bool got_frame = vcap.read(inframe);
if (!got_frame) {
std::cout << "No frame \n\n\n";
return false; // return false if frame retrieval fails
}
cv::cvtColor(inframe, rgbImage, cv::COLOR_BGR2RGB);
{
std::lock_guard<std::mutex> ilock(frame_queue_mutex);
frames_queue.push_back(rgbImage);
}
// Preprocess frame
cv::Mat preProcframe = preprocess(rgbImage);
// Set preprocessed input data to be sent to accelarator
dst[0]->set_data((float*)preProcframe.data, false);
return true;
}
else{
vcap.release();
return false;
}
}
Note
In the above code, method preprocess is used as pre-processing step. This method can be found as a part of the full code file.
Define Output Functions#
We also need to define an out function for the accelerator to use. Our output function will post-process the accelerator output and display it on the screen.
Note
This is the not the same as cropped post processing discussed before. This section refers to the post preocessing needs to be done on the image. In this example post-preocessing referes to decoding output, drawing boxes and displaying the image.
The output function will also overlay and display the output frame besides the MXA data collection and post-processing.
bool outcallback_getmxaoutput(vector<const MX::Types::FeatureMap<float>*> src, int streamLabel){
for(int i =0; i<src.size();++i){
src[i]->get_data(output[i]);
}
{
std::lock_guard<std::mutex> ilock(frame_queue_mutex);
// pop from frame queue
displayImage = frames_queue.front();
frames_queue.pop_front();
}// releases in frame queue lock
//Get the detections from model output
num_boxes = output[outmap_.num_boxes_idx][0];
std::vector<detectedObj> detected_objectVector = get_detections(output);
// draw boundign boxes
draw_bounding_box(displayImage, detected_objectVector );
// using mx QT util to update the display frame
gui_->screens[0]->SetDisplayFrame(streamLabel,&displayImage,fps_number);
//Calulate FPS once every AVG_FPS_CALC_FRAME_COUNT frames
frame_count++;
if (frame_count == 1)
{
start_ms = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now().time_since_epoch());
}
else if (frame_count % AVG_FPS_CALC_FRAME_COUNT == 0)
{
std::chrono::milliseconds duration =
std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now().time_since_epoch()) - start_ms;
fps_number = (float)AVG_FPS_CALC_FRAME_COUNT * 1000 / (float)(duration.count());
frame_count = 0;
}
return true;
}
Connect the Accelerator#
The main() function Creates the accelerator, CenterNet object and starts the acceleartor and waits for it to finish
accl = new MX::Runtime::MxAccl;
accl->connect_dfp(onnx_model_path.c_str());
accl->connect_pre_model(onnx_preprocessing_model_path,0);
accl->connect_post_model(onnx_postprocessing_model_path,0);
//Creating a CenterNet object for each stream which also connects the corresponding stream to accl.
CenterNet* obj;
if(plugin_name=="onnx"){
obj = new CenterNet(accl,video_src,&gui,App_Onnx);
}
else if (plugin_name=="tf"){
obj = new CenterNet(accl,video_src,&gui,App_Tf);
}
else{
obj = new CenterNet(accl,video_src,&gui,App_Tflite);
}
//Run the accelerator and wait
accl->start();
gui.Run(); //This command waits for exit to be pressed in Qt window
accl->stop();
The CenterNet() constructor connects the input stream to the accelartor.
auto in_cb = std::bind(&CenterNet::incallback_getframe, this, std::placeholders::_1, std::placeholders::_2);
auto out_cb = std::bind(&CenterNet::outcallback_getmxaoutput, this, std::placeholders::_1, std::placeholders::_2);
accl->connect_stream(in_cb, out_cb, 0, 0);
How to Use#
Users can download the attached zip file and compile the application with cmake. This will result in an executable, CenterNet
. The following commands are supposed to be run in a terminal in the same directory as the executable.
Default run, starts the application with onnx models and uses a pre-stored video file,
./CenterNet
Users can specify their desired model library to start the application with that library to run th application on a pre-stored video file,
./CenterNet tflite
./CenterNet tf
Users can specify their desired model library and desired input to the application,
./CenterNet onnx vid:<path to video file>
./CenterNet tflite cam:<camera index>
Summary#
This tutorial showed how to use the Accelerator C++ API to run a inference using an centernet model. The code and the resources used in the tutorial are available to download: