YOLOv7t Object Detection#

Introduction#

In this tutorial, we will show how to use the AsyncAccl Python API to perform real-time object detection on MX3. We will use the YOLOv7-tiny model for our demo.

Note

This tutorial assumes a four-chip solution is correctly connected.

Download the Model#

The YOLOv7 pre-trained models are available on the Official YOLOv7 GitHub page. For the sake of the tutorial, we exported the model for the user to download. The model can be found in the compressed folder attached to this tutorial.

Compile the Model#

The YOLOv7-tiny model was exported with the option to include a post-processing section in the model graph. Hence, it needed to be compiled with the neural compiler autocrop option. After the compilation, the compiler will generate the DFP file for the main section of the model (yolov7-tiny_416.dfp) and the cropped post-processing section of the model (yolov7-tiny_416.post.onnx). The compilation step is typically needed once and can be done using the Neural Compiler API or Tool.

Hint

You can use the pre-compiled DFP and post-processing section attached to this tutorial and skip the compilation step.

from memryx import NeuralCompiler
nc = NeuralCompiler(num_chips=4, models="yolov7-tiny_416.onnx", verbose=1, dfp_fname = "yolov7-tiny_416", autocrop=True)
dfp = nc.run()

In your command line, you need to type,

mx_nc -v -m yolov7-tiny_416.onnx --autocrop -c 4

This will produce a DFP file ready to be used by the accelerator. In your Python code, you need to point the dfp variable to the generated file path,

dfp = "yolov7-tiny_416.dfp"

In your Python code, you need to point the dfp variable to the generated file path,

dfp = "yolov7-tiny_416.dfp"

In your C++ code, you need to point the dfp via a generated file path,

//YoloV7 application specific parameters

CV Pipelines#

In this tutorial, we will show two different end-to-end implementations for the CV graph. The first one is the Sequential Display option. In that case, the overlay and the output display will be part of the output function connected to the AsyncAccl API. This implementation is simple, but having the overlay part of the output function might limit the inference performance on some systems. The following flowchart shows the different parts of the pipeline. It should be noted that the input camera frame should be saved (queued) to be later overlayed and displayed.

graph LR input([Input Function]) --> accl[Accelerator] accl --> output([Output Function]) input.->q[[Frames Queue]] q .-> output style input fill:#CFE8FD, stroke:#595959 style accl fill:#FFE699, stroke:#595959 style output fill:#A9D18E, stroke:#595959 style q fill:#dbd9d3, stroke:#595959

The second option is the Threaded Display, where the overlay/display will have its own parallel thread. In that case, the display thread will collect the frames and the detections from the queues and display them independently, as shown in the following flowchart.

graph LR subgraph Inference input([Input Function]) --> accl[Accelerator] accl --> output([Output Function]) end input.->fq[[Frames Queue]] output .-> dq[[Dets Queue]] subgraph Display display([Display Thread]) end fq .-> display dq .-> display style input fill:#CFE8FD, stroke:#595959 style accl fill:#FFE699, stroke:#595959 style output fill:#A9D18E, stroke:#595959 style fq fill:#dbd9d3, stroke:#595959 style dq fill:#dbd9d3, stroke:#595959

CV Initializations#

Import the needed libraries, initialize the CV pipeline, and define common variables in this step.

# OpenCV and helper libraries imports
import numpy as np
import cv2
from queue import Queue, Empty
from threading import Thread, Event
import sys

# CV and Queues
num_frames = 0
cap_queue = Queue(maxsize=10)
dets_queue = Queue(maxsize=10)
src = sys.argv[1] if len(sys.argv) > 1 else '/dev/video0'
vidcap = cv2.VideoCapture(src) 
dims = ( int(vidcap.get(cv2.CAP_PROP_FRAME_WIDTH)), 
        int(vidcap.get(cv2.CAP_PROP_FRAME_HEIGHT)) )

Model Pre-/Post-Processing#

The pre-/post-processing steps are typically provided by the model authors and are outside of the scope of this tutorial. We provided a helper class with the tutorial compressed folder that implements the pre- and post-processing of YOLOv7, and the user can check it for their reference. You can use the helper class as follows,

from yolov7 import YoloV7Tiny
model = YoloV7Tiny(stream_img_size=(dims[1], dims[0], 3))

The accl.set_postprocessing_model will automatically retrieve the output from the chip, apply the cropped graph post-processing section using the ONNX runtime, and generate the final output.

accl.set_postprocessing_model('yolov7-tiny_416.post.onnx', model_idx=0)

After that, output can then be sent to the post-processing code in the YOLOv7 helper class to get the detection on the output image.

Define an Input Function#

We need to define an input function for the accelerator to use. In this case, our input function will get a new frame from the cam and pre-process it.

def capture_and_preprocess():

    # Get a frame from the cam
    got_frame, frame = vidcap.read()

    if not got_frame:
        return None

    # Put the frame in the cap_queue to be overlayed later
    cap_queue.put(frame)

    # Preporcess frame
    frame = model.preprocess(frame)
    return frame

Define Output Functions#

We also need to define an out function for the accelerator to use. Our output function will post-process the accelerator output and display it on the screen.

The output function will also overlay and display the output frame besides the MXA data collection and post-processing.

def postprocess_and_display(*mxa_output):

    # Post-process the MXA output
    dets = model.postprocess(mxa_output)

    # Get the frame from the frames queue
    frame = cap_queue.get()

    # Draw the OD boxes
    for d in dets:
        l,t,r,b = d['bbox']
        frame = cv2.rectangle(frame, (l,t), (r,b), (255,0,0), 2) 
        frame = cv2.rectangle(frame, (l,t-18), (r,t), (255,0,0), -1) 
        frame = cv2.putText(frame, d['class'], (l+2,t-5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255,255,255), 2)

    # Show the frame
    cv2.imshow('YOLOv7-Tiny on MemryX MXA', frame)

    # Exit on a key press
    if cv2.waitKey(1) == ord('q'):
        cv2.destroyAllWindows()
        vidcap.release()
        exit(1)

Connect the Accelerator#

Now we need to connect your input and output functions to the AsyncAccl API. The API will take care of the rest.

# AsyncAccl
from memryx import AsyncAccl
accl = AsyncAccl(dfp=dfp)

# Gets the output from the chip and performs the cropped graph post-processing
accl.set_postprocessing_model('yolov7-tiny_416.post.onnx', model_idx=0)

# Connect the input and output functions and let the accl run
accl.connect_input(capture_and_preprocess)
accl.connect_output(postprocess_and_display)
accl.wait()

The accelerator will automatically call the connected input and output functions in a fully pipelined fashion.

Third-Party Licenses#

This tutorial uses third-party software, models, and libraries. Below are the details of the licenses for these dependencies:

Summary#

This tutorial showed how to use the AsyncAccl Python API to run a real-time inference using an object-detection model. The code and the resources used in the tutorial are available to download: