Depth Estimation#
Introduction#
In this tutorial, we’ll showcase the model compilation using the neural compiler CLI and the Python API. The real-time inference execution will be illustrated with the AsyncAccl Python API and MxAccl C++ API.
Note
Ensure you have a 4-chip MX3 card properly set up before proceeding with this tutorial.
Download & Run
Download
This tutorial provides a high-level overview of the application’s key components. To run the full application, download the complete code package and the compiled DFP. After downloading, refer to the Run section below for step-by-step instructions.
Run
Requirements
Ensure the following dependencies are installed:
pip install opencv-python==4.11.0.86
sudo apt install curl
Run Command
Run the Python example for real-time depth estimation using MX3:
python src/python/run_depth_estimate.py
Step 1: Build the Project
Navigate to the C++ source directory and build the project:
cd src/cpp/
mkdir build
cd build
cmake ..
make
Step 2: Run the Application
Using default DFP and camera:
./depthEstimation
Using a video file:
./depthEstimation --video <path_to_video_file>
Using a custom DFP file:
./depthEstimation -d <path_to_dfp_file>
1. Download the Model#
The first step is to download and compile the neural network we want to use. Here we will use a pre-trained MIDASv2-small from TensorFlow hub, which can be downloaded as follows:
Steps are for explanation and learning
These step-by-step snippets are provided to explain the process and help you understand the concepts. For a complete, runnable version, please use the full scripts from the “Download & Run” section above.
curl -L -o ./midas_v2_small.tar.gz https://www.kaggle.com/api/v1/models/intel/midas/tfLite/v2-1-small-lite/1/download
tar -xzf ./midas_v2_small.tar.gz -C ./
mkdir -p models
mv 1.tflite models/MiDaS_256_256_3_tflite.tflite
from os import system, path
system(f"curl -L -o models/MiDaS_256_256_3_tflite.tar.gz https://www.kaggle.com/api/v1/models/intel/midas/tfLite/v2-1-small-lite/1/download")
system(f"tar -xzf models/MiDaS_256_256_3_tflite.tar.gz -C models/")
system("mkdir -p models")
system(f"mv models/1.tflite {model_path}")
In the current working directory, you should now have models/midas_v2_small.tflite.
2. Compile the Model#
Note
You can use the pre-compiled DFP attached to this tutorial and skip the compilation step. Please, make sure to include it in your working folder.
The compilation step is typically needed once and can be done using the Neural Compiler API or Tool.
In your Python code, you need to point the dfp variable to the generated file path,
dfp = "midas_v2_small.dfp"
In the following C++ code, you need to define DFP to point to the generated file path,
const fs::path modelPath = "midas_v2_small.dfp";
In your command line you need to type,
curl -L -o ./midas_v2_small.tar.gz https://www.kaggle.com/api/v1/models/intel/midas/tfLite/v2-1-small-lite/1/download
tar -xzf ./midas_v2_small.tar.gz -C ./
mkdir -p models
mv 1.tflite models/MiDaS_256_256_3_tflite.tflite
mx_nc -m models/MiDaS_256_256_3_tflite.tflite
This will produce a DFP file ready to be used by the accelerator. In your Python code, you need to point the dfp variable to the generated file path,
dfp = "midas_v2_small.dfp"
For C++, you need to define DFP to point to the generated file path,
const fs::path modelPath = "midas_v2_small.dfp";
3. CV Initializations#
We will import the needed libraries, initialize the CV pipeline, and define common variables in this step.
Steps are for explanation and learning
These step-by-step snippets are provided to explain the process and help you understand the concepts. For a complete, runnable version, please use the full scripts from the “Download & Run” section above.
Danger
You cannot copy-paste only the following code. It will not run successfully as-is.
from os import system, path
import argparse
import cv2 as cv
import numpy as np
import sys
from memryx import AsyncAccl, NeuralCompiler
###############################################################################
# Initializations #############################################################
###############################################################################
# Connect to the camera and get its properties
src = sys.argv[1] if len(sys.argv) > 1 else '/dev/video0'
cam = cv.VideoCapture(src)
input_height = int(cam.get(cv.CAP_PROP_FRAME_HEIGHT))
input_width = int(cam.get(cv.CAP_PROP_FRAME_WIDTH))
#include <iostream>
#include <signal.h>
#include <opencv2/opencv.hpp> /* imshow */
#include <opencv2/imgproc.hpp> /* cvtcolor */
#include <opencv2/imgcodecs.hpp> /* imwrite */
#include "memx/accl/MxAccl.h"
#include <filesystem>
#include <string>
std::string modelPath;
fs::path videoPath;
// In case of cameras try to use best possible input configurations which are setting the
// resolution to 640x480 and try to set the input FPS to 30
bool configureCamera(cv::VideoCapture& vcap) {
bool settings_success = true;
try {
if (!vcap.set(cv::CAP_PROP_FRAME_HEIGHT, 480) ||
!vcap.set(cv::CAP_PROP_FRAME_WIDTH, 640) ||
!vcap.set(cv::CAP_PROP_FPS, 30)) {
std::cout << "Setting vcap Failed\n";
cv::Mat simpleframe;
if (!vcap.read(simpleframe)) {
settings_success = false;
}
}
} catch (...) {
std::cout << "Exception occurred while setting properties\n";
settings_success = false;
}
return settings_success;
}
// Tries to open the camera with custom settings set in configureCamera
// If not possible, open it with default settings
bool openCamera(cv::VideoCapture& vcap, int device, int api) {
vcap.open(device, api);
if (!vcap.isOpened()) {
std::cerr << "Failed to open vcap\n";
return false;
}
if (!configureCamera(vcap)) {
vcap.release();
vcap.open(device, api);
if (vcap.isOpened()) {
std::cout << "Reopened vcap with original resolution\n";
} else {
std::cerr << "Failed to reopen vcap\n";
return false;
}
}
return true;
}
Along with necessary CV initialization, we also initialize necessary variables for storing DFP model information, image manipulations, and FPS calculations.
Initialize model info variable. We get this info after connecting to the accelerator.
MX::Types::MxModelInfo model_info;
model_info = accl->get_model_info(0);
Variables for image manipulations:
int origHeight;
int origWidth;
int model_input_height;
int model_input_width;
int model_output_height;
int model_output_width;
model_input_height = model_info.in_featuremap_shapes[0][0];
model_input_width = model_info.in_featuremap_shapes[0][1];
model_output_height = model_info.out_featuremap_shapes[0][0];
model_output_width = model_info.out_featuremap_shapes[0][1];
Variables used for FPS calculations:
int frame_count;
float fps_number;
std::string fps_text;
std::chrono::milliseconds start_ms;
4. Define an Input Function#
We need to define an input function for the accelerator to use. In this case, our input function will get a new frame from the camera and pre-process it.
def get_frame_and_preprocess():
"""
An input function for the accelerator to use. This input function will get
a new frame from the cam and pre-process it.
"""
got_frame, frame = cam.read()
if not got_frame:
return None
# Pre-processing steps
frame = cv.cvtColor(frame, cv.COLOR_BGR2RGB) / 255.0
frame = cv.resize(frame, (256, 256), interpolation=cv.INTER_CUBIC)
frame = np.array(frame)
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
frame = (frame - mean) / std
frame = np.expand_dims(frame, 0)
return frame.astype("float32")
bool incallback_getframe(std::vector<const MX::Types::FeatureMap*> dst, int streamLabel){
if(runflag.load()){
cv::Mat inframe;
bool got_frame = vcap.read(inframe);
if (!got_frame) {
std::cout << "No frame \n\n\n";
runflag.store(false);
return false;
}
else{
cv::resize(inframe, img_resized, img_resized.size());
cv::cvtColor(img_resized, img_resized, cv::COLOR_BGR2RGB);
img_resized.convertTo(img_model_in, CV_32FC3, 1.0 / 255.0);
cv::add(img_model_in, cv::Scalar(-0.485, -0.456, -0.406), img_model_in);
cv::multiply(img_model_in, cv::Scalar(1.0 / 0.229, 1.0 / 0.224, 1.0 / 0.225), img_model_in);
dst[0]->set_data((float*)img_model_in.data);
return true;
}
}
else{
vcap.release();
return false;
}
}
Hint
The pre-processing steps are typically provided by the model authors, which prepare the input stream to be consumed by the model.
5. Define an Output Function#
We also need to define an output function for the accelerator to use. Our output function will post-process the accelerator output and display it on the screen.
def postprocess_and_show_frame(*accl_output):
"""
An output function for the accelerator to use. This output function will
post-process the accelerator output and display it on the screen.
"""
prediction = accl_output[0][0]
# Post-processing steps
prediction = cv.resize(prediction, (input_width, input_height))
depth_min = prediction.min()
depth_max = prediction.max()
postprocessed_output = (255 * (prediction - depth_min) / (depth_max - depth_min)).astype("uint8")
postprocessed_output = cv.applyColorMap(postprocessed_output, cv.COLORMAP_INFERNO)
# Show the output
cv.imshow('Depth Estimation using MX3', postprocessed_output)
# Check if the window was closed
if cv.getWindowProperty('Depth Estimation using MX3', cv.WND_PROP_VISIBLE) < 1:
print("\033[93mWindow closed. Exiting.\033[0m")
cv.destroyAllWindows()
cam.release()
exit(1)
# Exit on a key press
if cv.waitKey(1) == ord('q'):
cv.destroyAllWindows()
cam.release()
exit(1)
bool outcallback_getmxaoutput(std::vector<const MX::Types::FeatureMap*> src, int streamLabel){
src[0]->get_data((float*)img_model_out.data);
double depth_min_d, depth_max_d;
float depth_min, depth_max;
cv::minMaxIdx(img_model_out, &depth_min_d, &depth_max_d);
depth_min = (float)depth_min_d;
depth_max = (float)depth_max_d;
float diff = depth_max - depth_min;
cv::add(img_model_out, cv::Scalar(-depth_min), img_model_out);
cv::multiply(img_model_out, cv::Scalar(1.0 / diff), img_model_out);
cv::multiply(img_model_out, cv::Scalar(255.0), img_model_out);
img_model_out.convertTo(img_model_out_uint, CV_8UC1);
cv::applyColorMap(img_model_out_uint, img_final_output, cv::COLORMAP_INFERNO);
frame_count++;
if (frame_count == 1)
{
start_ms = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now().time_since_epoch());
}
else if (frame_count % AVG_FPS_CALC_FRAME_COUNT == 0)
{
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(
std::chrono::system_clock::now().time_since_epoch()) - start_ms;
fps_number = (float)AVG_FPS_CALC_FRAME_COUNT * 1000 / (float)(duration.count());
// Round to 1 decimal and manually truncate after rounding
fps_number = std::round(fps_number * 10.0f) / 10.0f;
fps_text = "FPS = " + std::to_string(fps_number).substr(0, std::to_string(fps_number).find('.') + 2);
frame_count = 0;
}
cv::resize(img_final_output, img_final_out_resized, displaySize);
cv::putText(img_final_out_resized, fps_text,
cv::Point2i(10, 30), cv::FONT_ITALIC, 0.8,
cv::Scalar(255, 255, 0), 2);
if (!window_created){
cv::namedWindow(window_name, cv::WINDOW_NORMAL | cv::WINDOW_KEEPRATIO);
cv::resizeWindow(window_name, displaySize);
window_created = true;
}
cv::imshow(window_name, img_final_out_resized);
if (cv::waitKey(1) == 'q') {
runflag.store(false);
}
return true;
}
Hint
The post-processing steps are typically provided by the model authors, which prepare the model meta output to be used.
6. Connect the Accelerator#
Now, all you need to do is to connect your input and output functions to the AsyncAccl API. The API will take care of the rest.
accl = AsyncAccl(dfp, local_mode=False)
accl.connect_input(get_frame_and_preprocess)
accl.connect_output(postprocess_and_show_frame)
accl.wait()
The main() function creates the accelerator, DepthEstimation object, and starts the accelerator and waits for it to finish.
MX::Runtime::MxAccl accl(
fs::path(dfpPath), // DFP path
std::vector<int>{0}, // device_ids_to_use
std::array<bool, 2>{true, true}, // use_model_shape
false, // local_mode
MX::RPC::SchedulerOptions{600, 0, 16, 12, false, 11500, false, 50, 6}, // sched_options
MX::RPC::ClientOptions{false, 0}, // client_options
server_addr, // server_addr
10000, // server_port_base
false // ignore_server_
);
DepthEstimation app(&accl, use_cam);
accl.start();
accl.wait();
accl.stop();
The accelerator will automatically call the connected input and output functions in a fully pipelined fashion.
Third-Party Licenses#
This tutorial uses third-party software, models, and libraries. Below are the details of the licenses for these dependencies:
Model: MiDaS v2 Small (TF Lite) from Kaggle
License: MIT
Code and Pre/Post-Processing: Some code components, including pre/post-processing, were sourced from the MiDaS v2 Small model provided on Kaggle
License: MIT
Summary#
This tutorial showed how to use a Accelerator API to run a real-time inference using a depth estimate model.