YOLOv7t Object Detection#
Introduction#
In this tutorial, we will show how to use the AsyncAccl Python API to perform real-time object detection on MX3. We will use the YOLOv7-tiny model for our demo.
Note
This tutorial assumes a four-chip solution is correctly connected.
Download & Run
Download
This tutorial provides a high-level overview of the application’s key components. To run the full application, download the complete code package and the compiled DFP. After downloading, refer to the Run section below for step-by-step instructions.
Run
Requirements
Ensure the following dependencies are installed:
pip install opencv-python==4.11.0.86
Run Command
Run the Python example for real-time depth estimation using MX3:
# ensure a camera device is connected as default video input is a cam
cd src/python/
python run_yolov7_singlestream_objectdetection.py
The following section provides a step-by-step explanation of the necessary steps to run the application successfully.
1. Download the Model#
The YOLOv7 pre-trained models are available on the Official YOLOv7 GitHub page. For the sake of the tutorial, we exported and compiled the model for the user to download; it can be found in the compressed folder attached
to this tutorial.
2. Compile the Model#
The YOLOv7-tiny model was exported with the option to include a post-processing section in the model graph. Hence, it needed to be compiled with the Neural Compiler --autocrop
option. After the compilation, the compiler will generate the DFP file for the main section of the model (YOLO_v7_tiny_416_416_3_onnx.dfp
) and the cropped post-processing section of the model (YOLO_v7_tiny_416_416_3_onnx_post.onnx
). The compilation step is typically needed once and can be done using the Neural Compiler API or Tool.
Hint
You can use the pre-compiled DFP and post-processing section attached
to this tutorial and skip the compilation step.
from memryx import NeuralCompiler
nc = NeuralCompiler(num_chips=4, models="YOLO_v7_tiny_416_416_3_onnx.onnx", verbose=1, autocrop=True)
dfp = nc.run()
In your command line, you need to type,
# note that we've renamed the model to YOLO_v7_tiny_416_416_3_onnx for consistency with the tutorial code
mx_nc -m YOLO_v7_tiny_416_416_3_onnx -v --autocrop
This will produce a DFP file ready to be used by the accelerator. In your Python code, you need to point the dfp
variable to the generated file path,
dfp = "YOLO_v7_tiny_416_416_3_onnx.dfp"
In your Python code, you need to point the dfp
variable to the generated file path,
dfp = "YOLO_v7_tiny_416_416_3_onnx.dfp"
3. CV Pipelines#
In this tutorial, we will show two different end-to-end implementations for the CV graph. The first one is the Sequential Display option. In that case, the overlay and the output display will be part of the output function connected to the AsyncAccl API. This implementation is simple, but having the overlay part of the output function might limit the inference performance on some systems. The following flowchart shows the different parts of the pipeline. It should be noted that the input camera frame should be saved (queued) to be later overlayed and displayed.
The second option is the Threaded Display, where the overlay/display will have its own parallel thread. In that case, the display thread will collect the frames and the detections from the queues and display them independently, as shown in the following flowchart.
4. CV Initializations#
Import the needed libraries, initialize the CV pipeline, and define common variables in this step.
import time
import argparse
import numpy as np
import cv2
from queue import Queue, Full
from threading import Thread
from matplotlib import pyplot as plt
from memryx import AsyncAccl
from yolov7 import YoloV7Tiny as YoloModel
# CV and Queues
self.num_frames = 0
self.cap_queue = Queue(maxsize=4)
self.dets_queue = Queue(maxsize=5)
if "/dev/video" in str(video_path):
self.src_is_cam = True
else:
self.src_is_cam = False
self.vidcap = cv2.VideoCapture(video_path)
self.dims = ( int(self.vidcap.get(cv2.CAP_PROP_FRAME_WIDTH)),
int(self.vidcap.get(cv2.CAP_PROP_FRAME_HEIGHT)) )
5. Model Pre-/Post-Processing#
The pre-/post-processing steps are typically provided by the model authors and are outside of the scope of this tutorial. We provided a helper class with the tutorial compressed folder that implements the pre- and post-processing of YOLOv7, and the user can check it for their reference. You can use the helper class as follows,
# Model
self.model = YoloModel(stream_img_size=(self.dims[1],self.dims[0],3))
The accl.set_postprocessing_model
will automatically retrieve the output from the chip, apply the cropped graph post-processing section using the ONNX runtime, and generate the final output.
accl.set_postprocessing_model('YOLO_v7_tiny_416_416_3_onnx_post.onnx', model_idx=0)
After that, output can then be sent to the post-processing code in the YOLOv7 helper class to get the detection on the output image.
6. Define an Input Function#
We need to define an input function for the accelerator to use. In this case, our input function will get a new frame from the cam and pre-process it.
def capture_and_preprocess(self):
"""
Captures a frame for the video device and pre-processes it.
"""
while True:
got_frame, frame = self.vidcap.read()
if not got_frame:
return None
if self.src_is_cam and self.cap_queue.full():
# drop the frame and try again
continue
else:
self.num_frames += 1
# Put the frame in the cap_queue to be overlayed later
self.cap_queue.put(frame)
# Preporcess frame
frame = self.model.preprocess(frame)
return frame
7. Define Output Functions#
We also need to define an out function for the accelerator to use. Our output function will post-process the accelerator output and display it on the screen.
The output function will also overlay and display the output frame besides the MXA data collection and post-processing.
def postprocess(self, *mxa_output):
"""
Post-process the MXA output.
"""
# Post-process the MXA ouptut
dets = self.model.postprocess(mxa_output)
# Push the results to the queue to be used by the display_save thread
self.dets_queue.put(dets)
# Calculate current FPS
self.dt_array[self.dt_index] = time.time() - self.frame_end_time
self.dt_index +=1
if self.dt_index % 15 == 0:
self.fps = 1 / np.average(self.dt_array)
if self.dt_index >= 30:
self.dt_index = 0
self.frame_end_time = time.time()
8. Connect the Accelerator#
Now we need to connect your input and output functions to the AsyncAccl API. The API will take care of the rest.
from memryx import AsyncAccl
accl = AsyncAccl(dfp=self.dfp_path)
accl.set_postprocessing_model(self.postmodel_path, model_idx=0)
accl.connect_input(self.capture_and_preprocess)
accl.connect_output(self.postprocess)
accl.wait()
The accelerator will automatically call the connected input and output functions in a fully pipelined fashion.
Third-Party Licenses#
This tutorial uses third-party software, models, and libraries. Below are the details of the licenses for these dependencies:
Model: Yolov7 Tiny from GitHub
License: GPL
Code and Pre/Post-Processing: Some code components, including pre/post-processing, were sourced from their GitHub
License: GPL
Summary#
This tutorial showed how to use the AsyncAccl Python API to run a real-time inference using an object-detection model.