YOLOv7t Object Detection#
Introduction#
In this tutorial, we will show how to use the AsyncAccl Python API to perform real-time object detection on MX3. We will use the YOLOv7-tiny model for our demo.
Note
This tutorial assumes a four-chip solution is correctly connected.
Download the Model#
The YOLOv7 pre-trained models are available on the Official YOLOv7 GitHub page. For the sake of the tutorial, we exported the model for the user to download. The model can be found in the compressed folder attached
to this tutorial.
Compile the Model#
The YOLOv7-tiny model was exported with the option to include a post-processing section in the model graph. Hence, it needed to be compiled with the neural compiler autocrop
option. After the compilation, the compiler will generate the DFP file for the main section of the model (yolov7-tiny_416.dfp
) and the cropped post-processing section of the model (yolov7-tiny_416.post.onnx
). The compilation step is typically needed once and can be done using the Neural Compiler API or Tool.
Hint
You can use the pre-compiled DFP and post-processing section attached
to this tutorial and skip the compilation step.
from memryx import NeuralCompiler
nc = NeuralCompiler(num_chips=4, models="yolov7-tiny_416.onnx", verbose=1, dfp_fname = "yolov7-tiny_416", autocrop=True)
dfp = nc.run()
In your command line, you need to type,
mx_nc -v -m yolov7-tiny_416.onnx --autocrop -c 4
This will produce a DFP file ready to be used by the accelerator. In your Python code, you need to point the dfp
variable to the generated file path,
dfp = "yolov7-tiny_416.dfp"
In your Python code, you need to point the dfp
variable to the generated file path,
dfp = "yolov7-tiny_416.dfp"
In your C++ code, you need to point the dfp
via a generated file path,
//YoloV7 application specific parameters
CV Pipelines#
In this tutorial, we will show two different end-to-end implementations for the CV graph. The first one is the Sequential Display option. In that case, the overlay and the output display will be part of the output function connected to the AsyncAccl API. This implementation is simple, but having the overlay part of the output function might limit the inference performance on some systems. The following flowchart shows the different parts of the pipeline. It should be noted that the input camera frame should be saved (queued) to be later overlayed and displayed.
The second option is the Threaded Display, where the overlay/display will have its own parallel thread. In that case, the display thread will collect the frames and the detections from the queues and display them independently, as shown in the following flowchart.
CV Initializations#
Import the needed libraries, initialize the CV pipeline, and define common variables in this step.
# OpenCV and helper libraries imports
import numpy as np
import cv2
from queue import Queue, Empty
from threading import Thread, Event
import sys
# CV and Queues
num_frames = 0
cap_queue = Queue(maxsize=10)
dets_queue = Queue(maxsize=10)
src = sys.argv[1] if len(sys.argv) > 1 else '/dev/video0'
vidcap = cv2.VideoCapture(src)
dims = ( int(vidcap.get(cv2.CAP_PROP_FRAME_WIDTH)),
int(vidcap.get(cv2.CAP_PROP_FRAME_HEIGHT)) )
Model Pre-/Post-Processing#
The pre-/post-processing steps are typically provided by the model authors and are outside of the scope of this tutorial. We provided a helper class with the tutorial compressed folder that implements the pre- and post-processing of YOLOv7, and the user can check it for their reference. You can use the helper class as follows,
from yolov7 import YoloV7Tiny
model = YoloV7Tiny(stream_img_size=(dims[1], dims[0], 3))
The accl.set_postprocessing_model
will automatically retrieve the output from the chip, apply the cropped graph post-processing section using the ONNX runtime, and generate the final output.
accl.set_postprocessing_model('yolov7-tiny_416.post.onnx', model_idx=0)
After that, output can then be sent to the post-processing code in the YOLOv7 helper class to get the detection on the output image.
Define an Input Function#
We need to define an input function for the accelerator to use. In this case, our input function will get a new frame from the cam and pre-process it.
def capture_and_preprocess():
# Get a frame from the cam
got_frame, frame = vidcap.read()
if not got_frame:
return None
# Put the frame in the cap_queue to be overlayed later
cap_queue.put(frame)
# Preporcess frame
frame = model.preprocess(frame)
return frame
Define Output Functions#
We also need to define an out function for the accelerator to use. Our output function will post-process the accelerator output and display it on the screen.
The output function will also overlay and display the output frame besides the MXA data collection and post-processing.
def postprocess_and_display(*mxa_output):
# Post-process the MXA output
dets = model.postprocess(mxa_output)
# Get the frame from the frames queue
frame = cap_queue.get()
# Draw the OD boxes
for d in dets:
l,t,r,b = d['bbox']
frame = cv2.rectangle(frame, (l,t), (r,b), (255,0,0), 2)
frame = cv2.rectangle(frame, (l,t-18), (r,t), (255,0,0), -1)
frame = cv2.putText(frame, d['class'], (l+2,t-5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255,255,255), 2)
# Show the frame
cv2.imshow('YOLOv7-Tiny on MemryX MXA', frame)
# Exit on a key press
if cv2.waitKey(1) == ord('q'):
cv2.destroyAllWindows()
vidcap.release()
exit(1)
Connect the Accelerator#
Now we need to connect your input and output functions to the AsyncAccl API. The API will take care of the rest.
# AsyncAccl
from memryx import AsyncAccl
accl = AsyncAccl(dfp=dfp)
# Gets the output from the chip and performs the cropped graph post-processing
accl.set_postprocessing_model('yolov7-tiny_416.post.onnx', model_idx=0)
# Connect the input and output functions and let the accl run
accl.connect_input(capture_and_preprocess)
accl.connect_output(postprocess_and_display)
accl.wait()
The accelerator will automatically call the connected input and output functions in a fully pipelined fashion.
Third-Party Licenses#
This tutorial uses third-party software, models, and libraries. Below are the details of the licenses for these dependencies:
Model: Yolov7 Tiny from GitHub
License: GPL
Code and Pre/Post-Processing: Some code components, including pre/post-processing, were sourced from their GitHub
License: GPL
Summary#
This tutorial showed how to use the AsyncAccl Python API to run a real-time inference using an object-detection model. The code and the resources used in the tutorial are available to download: