Depth Estimation#
Introduction#
In this tutorial, we’ll showcase the model compilation using the neural compiler CLI and the Python API. The real-time inference execution will be illustrated with the AsyncAccl Python API and MxAccl C++ API.
Note
Ensure you have a 4 chip solution properly set up before proceeding with this tutorial.
Requirements#
Before running the application, ensure that OpenCV and curl are installed. You can install the necessary libraries using the following commands:
pip install opencv-python
sudo apt install curl
Download the Model#
The first step is to download and compile the neural network we want to use. Here we will use a pre-trained MIDASv2-small from TensorFlow hub, which can be downloaded as follows:
curl -L -o ./midas_v2_small.tar.gz https://www.kaggle.com/api/v1/models/intel/midas/tfLite/v2-1-small-lite/1/download
tar -xzf ./midas_v2_small.tar.gz -C ./
mv 1.tflite midas_v2_small.tflite
from os import system
system(f"curl -L -o ./midas_v2_small.tar.gz https://www.kaggle.com/api/v1/models/intel/midas/tfLite/v2-1-small-lite/1/download")
system(f"tar -xzf ./midas_v2_small.tar.gz -C ./")
system(f"mv ./1.tflite midas_v2_small.tflite")
In the current working directory, you should now have midas_v2_small.tflite
.
Compile the Model#
Note
You can use the pre-compiled DFP
attached to this tutorial and skip the compilation step. Please, make sure to include it in your working folder. Make sure to unzip the file before using it:
wget https://developer.memryx.com/example_files/1p1/depth_estimation_using_midas.zip
unzip depth_estimation_using_midas.zip
The compilation step is typically needed once and can be done using the Neural Compiler API or Tool.
In your Python code, you need to point the dfp
variable to the generated file path,
dfp = "midas_v2_small.dfp"
In the following C++ code, you need to define DFP
to point to the generated file path,
const fs::path modelPath = "midas_v2_1_small.dfp";
In your command line you need to type,
tar -xzf ./midas_v2_small.tar.gz -C ./
This will produce a DFP file ready to be used by the accelerator. In your Python code, you need to point the dfp
variable to the generated file path,
dfp = "midas_v2_small.dfp"
For C++, you need to define DFP
to point to the generated file path,
void signalHandler(int pSignal){
system(f"mv ./1.tflite midas_v2_small.tflite")
from memryx import NeuralCompiler
CV Initializations#
We will import the needed libraries, initialize the CV pipeline, and define common variables in this step.
# OpenCV and helper libraries imports
import cv2 as cv
import numpy as np
# Connect to the cam and get its properties
import sys
src = sys.argv[1] if len(sys.argv) > 1 else '/dev/video0'
cam = cv.VideoCapture(src)
input_height = int(cam.get(cv.CAP_PROP_FRAME_HEIGHT))
input_width = int(cam.get(cv.CAP_PROP_FRAME_WIDTH))
#include <opencv2/opencv.hpp> /* imshow */
#include <opencv2/imgproc.hpp> /* cvtcolor */
#include <opencv2/imgcodecs.hpp> /* imwrite */
fs::path videoPath = "../pexels_videos_2103099.mp4";
if(use_cam){
#ifdef __linux__
std::cout << "Running on Linux" << "\n";
if (!openCamera(vcap, 0, cv::CAP_V4L2)) {
throw(std::runtime_error("Failed to open: camera 0"));
}
#elif defined(_WIN32)
std::cout << "Running on Windows" << "\n";
if (!openCamera(vcap, 0, cv::CAP_ANY)) {
throw(std::runtime_error("Failed to open: camera 0"));
}
#endif
}
else{
vcap.open(videoPath.c_str(),cv::CAP_ANY);
}
Along with necessary CV initialization, we also initialize necessary variables for storing DFP model information, image manipulations, and FPS calculations.
Initialize model info variable. We get this info after connecting to the accelerator.
MX::Types::MxModelInfo model_info;
Variables for image manipulations:
int origHeight;
int origWidth;
int model_input_height;
int model_input_width;
int model_output_height;
int model_output_width;
Variables used for FPS calculations:
//FPS calculation variables
int frame_count;
float fps_number;
string fps_text;
chrono::milliseconds start_ms;
Define an Input Function#
We need to define an input function for the accelerator to use. In this case, our input function will get a new frame from the camera and pre-process it.
def get_frame_and_preprocess():
# Get a frame from the cam
got_frame, frame = cam.read()
if not got_frame:
return None
# Pre-processing steps
frame = cv.cvtColor(frame, cv.COLOR_BGR2RGB) / 255.0
frame = cv.resize(frame, (256, 256), interpolation=cv.INTER_CUBIC)
frame = np.array(frame)
mean=[0.485, 0.456, 0.406]
std=[0.229, 0.224, 0.225]
frame = (frame - mean) / std
return frame.astype(np.float32)
// Input callback function
bool incallback_getframe(vector<const MX::Types::FeatureMap<float>*> dst, int streamLabel){
if(runflag.load()){
cv::Mat inframe;
bool got_frame = vcap.read(inframe);
if (!got_frame) {
std::cout << "No frame \n\n\n";
runflag.store(false);
return false; // return false if frame retrieval fails
}
else{
// resize to model size
cv::resize(inframe, img_resized, img_resized.size());
// convert to RGB
cv::cvtColor(img_resized, img_resized, cv::COLOR_BGR2RGB);
// convert to FP32 [0,1]
img_resized.convertTo(img_model_in, CV_32FC3, 1.0 / 255.0);
// apply the MiDAS normalization constants
cv::add(img_model_in, cv::Scalar(-0.485, -0.456, -0.406), img_model_in);
cv::multiply(img_model_in, cv::Scalar(1.0/0.229, 1.0/0.224, 1.0/0.225), img_model_in);
// Set preprocessed input data to be sent to accelarator
dst[0]->set_data((float*)img_model_in.data, false);
return true;
}
}
else{
vcap.release();
return false;
}
}
Hint
The pre-processing steps are typically provided by the model authors, which prepare the input stream to be consumed by the model.
Define an Output Function#
We also need to define an output function for the accelerator to use. Our output function will post-process the accelerator output and display it on the screen.
def postprocess_and_show_frame(*accl_output):
prediction = accl_output[0]
# Post-processing steps
prediction = cv.resize(prediction, (input_width, input_height))
depth_min = prediction.min()
depth_max = prediction.max()
postprocessed_output = (255 * (prediction - depth_min) / (depth_max - depth_min)).astype("uint8")
postprocessed_output = cv.applyColorMap(postprocessed_output, cv.COLORMAP_INFERNO)
# Show the output
cv.imshow('Depth Estimation using MX3', postprocessed_output)
# Exit on a key press
if cv.waitKey(1) == ord('q'):
cv.destroyAllWindows()
cam.release()
exit(1)
// Output callback function
bool outcallback_getmxaoutput(vector<const MX::Types::FeatureMap<float>*> src, int streamLabel){
//get output data from accelarator
src[0]->get_data((float *)img_model_out.data, false);
double depth_min_d, depth_max_d;
float depth_min, depth_max;
cv::minMaxIdx(img_model_out, &depth_min_d, &depth_max_d);
depth_min = (float)depth_min_d;
depth_max = (float)depth_max_d;
float diff = depth_max - depth_min;
// do the scaling
cv::add(img_model_out, cv::Scalar(-1.0 * depth_min), img_model_out);
cv::multiply(img_model_out, cv::Scalar(1.0 / diff), img_model_out);
cv::multiply(img_model_out, cv::Scalar(255.0), img_model_out);
// convert to UINT8
img_model_out.convertTo(img_model_out_uint, CV_8UC1);
// apply colormap
cv::applyColorMap(img_model_out_uint, img_final_output, cv::COLORMAP_INFERNO);
//Calulate FPS once every AVG_FPS_CALC_FRAME_COUNT frames
frame_count++;
if (frame_count == 1)
{
start_ms = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now().time_since_epoch());
}
else if (frame_count % AVG_FPS_CALC_FRAME_COUNT == 0)
{
std::chrono::milliseconds duration =
std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now().time_since_epoch()) - start_ms;
fps_number = (float)AVG_FPS_CALC_FRAME_COUNT * 1000 / (float)(duration.count());
fps_text = "FPS ="+to_string(fps_number);
frame_count = 0;
}
cv::resize(img_final_output, img_final_out_resized,displaySize);
//Write FPS values on the display image
cv::putText(img_final_out_resized,fps_text,
cv::Point2i(10, 30), // origin of text (bottom left of textbox)
cv::FONT_ITALIC,
0.8, // font scale
cv::Scalar(255, 255, 0), // color (green)
2 // thickness
);
//Create a postion the display window in the first iteration
if(!window_created){
cv::namedWindow(window_name, cv::WINDOW_NORMAL | cv::WINDOW_KEEPRATIO);
cv::resizeWindow(window_name, displaySize);
int posx = streamLabel%4;
int posy = streamLabel/4;
cv::moveWindow(window_name, 50+640*posx, posy*500);
window_created=true;
}
// Display the image with detections
cv::imshow(window_name, img_final_out_resized);
if (cv::waitKey(1) == 'q') {
runflag.store(false);
}
return true;
}
Hint
The post-processing steps are typically provided by the model authors, which prepare the model meta output to be used.
Connect the Accelerator#
Now, all you need to do is to connect your input and output functions to the AsyncAccl API. The API will take care of the rest.
from memryx import AsyncAccl
accl = AsyncAccl(dfp)
accl.connect_input(get_frame_and_preprocess)
accl.connect_output(postprocess_and_show_frame)
accl.wait()
The main() function creates the accelerator, DepthEstimation object, and starts the accelerator and waits for it to finish.
MX::Runtime::MxAccl accl;
int tag = accl.connect_dfp(modelPath);
DepthEstimation app(&accl,use_cam);
accl.start();
accl.wait();
The DepthEstimation() constructor opens the video capture based on input provided and connects the input stream to the accelerator.
//Connecting the stream to the accl object. As the callback functions are defined as part of the class
//DepthEstimation we should bind them with the possible input parameters
auto in_cb = std::bind(&DepthEstimation::incallback_getframe, this, std::placeholders::_1, std::placeholders::_2);
auto out_cb = std::bind(&DepthEstimation::outcallback_getmxaoutput, this, std::placeholders::_1, std::placeholders::_2);
accl->connect_stream(in_cb, out_cb, 0/**unique stream idx*/, 0/**model idx*/);
The accelerator will automatically call the connected input and output functions in a fully pipelined fashion.
Third-Party Licenses#
This tutorial uses third-party software, models, and libraries. Below are the details of the licenses for these dependencies:
Model: MiDaS v2 Small (TF Lite) from Kaggle
License: MIT
Code and Pre/Post-Processing: Some code components, including pre/post-processing, were sourced from the MiDaS v2 Small model provided on Kaggle
License: MIT
Summary#
This tutorial showed how to use a Accelerator API to run a real-time inference using a depth estimate model. The full code and the compiled DFP are available for download.