Face Detect & Emotion Classification#

Introduction#

In this multi-model example tutorial, we will demonstrate how to use the Acclerator API for real-time face emotion detection on MX3 in both Python and C++. We will use the face_detection_short_range.tflite model for face detection and the mobilenet_7.h5 model for emotion recognition in our demo.

Note

This tutorial assumes a four-chip solution is correctly connected.

Download the Model#

For the sake of the tutorial, we exported both models for the user to download. The models can be found in the compressed folder attached to this tutorial.”

Compile the Model#

Note

Here, we have two different models combined into a single DFP model for our use case. You can use the pre-compiled DFP attached to this tutorial and skip the compilation step. Please, make sure to include it in your working folder.

The compilation step is typically needed once and can be done using the Neural Compiler API or Tool.

Compile (CLI)

In your command line you need to type,

mx_nc -v -m  face_detection_short_range.tflite mobilenet_7.h5 --autocrop

This will produce a single DFP file combining both models, labeled as model 1 and model 2, ready to be used by the accelerator. In your Python code, you need to point the dfp variable to the generated file path.

dfp = "models.dfp"

For C++, you need to define DFP to point to the generated file path,

// model file 
const fs::path modelPath = "models.dfp";

Pre-Compiled (C++)

In the following C++ code, you need to define DFP to point to the generated file path,

// model file 
const fs::path modelPath = "models.dfp";

Pre-Compiled (Python)

In your Python code, you need to point the dfp variable to the generated file path,

dfp = "models.dfp"

Requirements#

For C++ applications, download the supported version of PyTorch as specified below.

CV Pipelines#

In this tutorial, we will showcase how to use two models compiled into a single DFP and how they perform together in a single application.

In this example, first, the face is detected by the face detection model, and a cropped frame of the face is then passed to the emotion detection model. Then, both outputs are combined to produce the final output.

The following flowchart shows the flow of the application.

graph LR input1([Face Input Function]) --> accl1[Accelerator] accl1 --> output1([Face Output Function]) output1 --> input2([Emotion Input Function]) input2 --> accl2[Accelerator] accl2 --> output2([Emotion Output Function]) style input1 fill:#CFE8FD, stroke:#595959 style accl1 fill:#FFE699, stroke:#595959 style output1 fill:#A9D18E, stroke:#595959 style input2 fill:#CFE8FD, stroke:#595959 style output2 fill:#A9D18E, stroke:#595959 style accl2 fill:#FFE699, stroke:#595959

CV Initializations#

We will import the needed libraries, intialize the CV pipeline, and define common variables in this step.

C++

// video capture to read cmaera or file
cv::VideoCapture vcap;

// video file
const fs::path videoPath = "../Friends.mp4"; 
const fs::path imagePath = "../face.jpg";

std::string emojiDir =  "../emojis";

cv::Mat image = cv::imread(imagePath);

std::map<int, cv::Mat> idxToEmoji;

std::deque<int> emotion_queue;
std::map<int, int> emotion_ctr;
const int emotion_duration = 7;

Along with necessary CV initialization we also initalize necessary varibles for storing DFP model information, image manipulations and fps calculations

Initialize model info variable. we get this info after connecting to the accelarator.

// model file 
const fs::path modelPath = "models.dfp";

//model info
MX::Types::MxModelInfo model_info;
MX::Types::MxModelInfo model_info_emotion;

Queues to handle the input and output between the face detection and emotion recognition models in callback and outcallback functions.

//Queues to add input frames
std::deque<cv::Mat> frames_queue;
std::mutex frameQueue_Lock;

std::deque<cv::Mat> frames_queue_face;
std::mutex frameQueue_Lock_face;
std::condition_variable f_cond;

std::deque<cv::Mat> frames_queue_oface;
std::mutex frameQueue_Lock_oface;

std::deque<cv::Mat> frames_queue_emotion;
std::mutex frameQueue_Lock_emotion;

//Queus to add output from mxa
std::deque<std::vector<float*>> ofmap_queue;
std::mutex ofmap_queue_lock;

std::deque<std::vector<float*>> ofmap_queue_emotion;
std::mutex ofmap_queue_lock_emotion;

Variables for image manipulations

double origHeight = 0.0;
double origWidth = 0.0;

int model0_input_width = 128;
int model0_input_height = 128;

int model1_input_width = 224;
int model1_input_height = 224;

Variables used for fps calculations

int frame_count = 0;
float fps_number =.0;
char fps_text[64] = "FPS = ";
std::chrono::milliseconds start_ms;
const int AVG_FPS_CALC_FRAME_COUNT = 50;

Python

import cv2 as cv
import numpy as np

from memryx import AsyncAccl
from face_detection.app import App as FaceApp
from emotion_recognition.app import App as EmotionApp

Note

The face_detection.app and emotion_recognition.app files are in the folder multimodel_python.tar.xz provided below, which contains separate scripts to perform face detection and emotion recognition separately.

Define an Input Function#

We need to define two input functions for the accelerator to use. In this case, our input functions will be the face incallback and emotion incallback functions.

Face model input function: Here, we get a new frame from the camera and preprocess it.

C++

bool incallback_getframe_face(vector<const MX::Types::FeatureMap<float>*> dst, int streamLabel){

    if(runflag.load()){
        bool got_frame = false;
        cv::Mat inframe;
        
        if(use_cam){
            got_frame = vcap.read(inframe);
        }

        else if(use_img){
            inframe = image; 
            if(!inframe.empty()){
                got_frame = true;
            }
            image.release();
        }

        else{
            got_frame = vcap.read(inframe);
        }

        if (!got_frame) {
            std::cout << "\n\n No frame - End of video/cam/img \n\n\n";
            runflag.store(false);
            return false;  // return false if frame retrieval fails
        }

        else{

            // Put the frame in the cap_queue to be overlayed later
            {
                std::unique_lock<std::mutex> flock(frameQueue_Lock);
                frames_queue.push_back(inframe);
            }

            // Preprocess frame
            cv::Mat preProcframe = preprocess_face(inframe);

            // Set preprocessed input data to be sent to accelarator
            dst[0]->set_data((float*)preProcframe.data, false);

            return true;
        }           
    }
    else{
        vcap.release();
        runflag.store(false);
        return false;
    }    
}

Python

    accl.connect_input(app.generate_frame_face)

    def generate_frame_face(self):
        frame = self.face_app.generate_frame()
        if frame is None:
            return None
        orig_frame = self.face_app.capture_queue.get()
        self.capture_queue.put(orig_frame)
        self.face_app.capture_queue.put(orig_frame)
        return frame

Emotion model input function: Here, we get a cropped face frame from the face output callback function and then preprocess it.

C++

bool incallback_getframe_emotion(vector<const MX::Types::FeatureMap<float>*> dst, int streamLabel){

   if(runflag.load()){

       cv::Mat inframe;
       {
           std::unique_lock<std::mutex> lock(frameQueue_Lock_face);
           auto now = std::chrono::steady_clock::now();
           if(!f_cond.wait_until(lock,now+1000ms, [](){ return !frames_queue_face.empty(); }))
           {
            runflag.store(false);
            return false;
           }
           // At this point, the lock is re-acquired after the wait
           inframe = frames_queue_face.front();
           frames_queue_face.pop_front();
       }
       // Preprocess frame
       cv::Mat preProcframe = preprocess_emotion(inframe, true);

       // Set preprocessed input data to be sent to accelarator
       dst[0]->set_data((float*)preProcframe.data, false);

       return true;
   }
   else{
       runflag.store(false);
       return false;
   }   
}

Python

    accl.connect_input(app.generate_frame_emotion, 1)

    def generate_frame_emotion(self):
        # self.face_done.wait()
        # self.face_done.clear()
        try:
            face = cv.resize(self.face, (224, 224), interpolation=cv.INTER_CUBIC)
        except Exception:
            self.emotion_app.background = True
            face = np.zeros((224,224,3))
        self.emotion_app.cap_queue.put(self.capture_queue.get())
        return face.astype(np.float32)

Note

The pre-processing steps are defined in the multimodel.cpp code provided below, which prepares the input stream to be consumed by the model.

Define an Output Function#

We also need to define two output functions for the face and emotion models separately for the accelerator to use. These output functions will post-process the accelerator’s output and display it on the screen.

Face model output function: Here, we get the detection output from the model, post-process it, and perform two functions:

Draw the bounding boxes on the frame.
Crop the bounding boxes, i.e., just crop the face, and add it to the new queue to pass this as an input to the face incallback function.

C++

// Output callback function
bool outcallback_getmxaoutput_face(vector<const MX::Types::FeatureMap<float>*> src, int streamLabel){

    std::vector<float*> ofmap;
    ofmap.reserve(src.size());
    for(int i = 0; i<model_info.num_out_featuremaps ; ++i){
        float * fmap = new float[model_info.out_featuremap_sizes[i]];
        src[i]->get_data(fmap);
        ofmap.push_back(fmap);
    }

    cv::Mat frame;
    cv::Mat inframe;

    {
        std::unique_lock<std::mutex> ilock(frameQueue_Lock);
        // pop from frame queue
        frame = frames_queue.front();
        frames_queue.pop_front();
    } // releases in frame queue lock

    torch::Tensor dets = model.postprocess(ofmap);

    draw(frame, dets);

    inframe = cropFirstDetectedFace(frame, dets);

    {
        std::unique_lock<std::mutex> flock(frameQueue_Lock_face);
        frames_queue_face.push_back(inframe);
        f_cond.notify_one();
    }

    {
        std::unique_lock<std::mutex> flock(frameQueue_Lock_oface);
        frames_queue_oface.push_back(frame);
    }

    for (auto& fmap : ofmap) {
        delete[] fmap;
        fmap = NULL;
    }
    
    return true;
}

Python

    accl.connect_output(app.process_face)

    def process_face(self, *ofmaps):
        if(len(ofmaps)==1):
            ofmaps = ofmaps[0]
        self.face, face_count = self.face_app.process_face(*ofmaps)
        if face_count == 0:
            self.emotion_app.background = True
        #self.face_done.set()
        return self.face

Emotion model output function: Here, we get the emotion index output from the model, post-process it by drawing the emoji on the frame, and display the face detection and emotion on the screen.

C++

// Output callback function
bool outcallback_getmxaoutput_emotion(vector<const MX::Types::FeatureMap<float>*> src, int streamLabel){

    std::vector<float*> ofmap_emotion;
    cv::Mat inframe;

    ofmap_emotion.reserve(src.size());

    for(int i=0; i<model_info_emotion.num_out_featuremaps ; ++i){
        float * fmap_emotion = new float[model_info_emotion.out_featuremap_sizes[i]];
        src[i]->get_data(fmap_emotion, true);
        ofmap_emotion.push_back(fmap_emotion);
    }

    {
        std::unique_lock<std::mutex> lock(frameQueue_Lock_oface);
        inframe = frames_queue_oface.front();
        frames_queue_oface.pop_front();
    }

    int emotionIdx = getEmotionIndex(ofmap_emotion); // Use the function to find the emotion index

    emotionIdx = smoothEmotion(emotionIdx);

    drawEmoji(inframe, emotionIdx);

    for (auto& fmap_emotion : ofmap_emotion) {
        delete[] fmap_emotion;
        fmap_emotion = NULL;
    }

    if(!window_created){

        cv::namedWindow("Face Detection", cv::WINDOW_NORMAL | cv::WINDOW_KEEPRATIO);
        cv::resizeWindow("Face Detection", cv::Size(640,480));
        cv::moveWindow("Face Detection", 0, 0);
        window_created=true;
    }

    frame_count++;
            
    if (frame_count == 1)
    {
        start_ms = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now().time_since_epoch());
    }

    else if (frame_count % AVG_FPS_CALC_FRAME_COUNT == 0)
    {
        std::chrono::milliseconds duration =
            std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now().time_since_epoch()) - start_ms;

        fps_number = (float)AVG_FPS_CALC_FRAME_COUNT * 1000 / (float)(duration.count());
        frame_count = 0;
    }

    sprintf(fps_text, "FPS = %.1f", fps_number);
    std::cout << "\r" << fps_text << "\t" << std::flush;

    cv::putText(inframe,fps_text,
        cv::Point2i(10, 30), // origin of text (bottom left of textbox)
        cv::FONT_ITALIC,
        0.8, // font scale
        cv::Scalar(255, 255, 0), // color (green)
        2 // thickness
    );

    // Display the image with detections
    cv::imshow("Face Detection", inframe);
    
    if(use_img){
        cv::waitKey(1000);
        runflag.store(false);
    }

    else{
        if (cv::waitKey(1) == 'q') {
            runflag.store(false);
        }
    }

    return true;
}

Python

    accl.connect_output(app.process_emotion, 1)

    def process_emotion(self, *ofmaps):
        out = self.emotion_app.process_model_output(*ofmaps)
        self.emotion_app.show(out)
        return out

Note

The post-processing steps are defined in the multimodel.cpp code provided below, which prepares the input stream to be consumed by the model.

Connect the Accelerator#

Now, all you need to do is to connect your input and output functions to the AsyncAccl API. The API will take care of the rest.

C++

MX::Runtime::MxAccl accl;
accl.connect_dfp(modelPath.c_str());

model_info = accl.get_model_info(0);
print_model_info_face();
model0_input_height = model_info.in_featuremap_shapes[0][0];
model0_input_width = model_info.in_featuremap_shapes[0][1];

accl.connect_stream(&incallback_getframe_face, &outcallback_getmxaoutput_face, 0 /*unique stream ID*/, 0 /*Model ID */);   

model_info_emotion = accl.get_model_info(1);
print_model_info_emotion();
model1_input_height = model_info_emotion.in_featuremap_shapes[0][0];
model1_input_width = model_info_emotion.in_featuremap_shapes[0][1];

accl.connect_stream(&incallback_getframe_emotion, &outcallback_getmxaoutput_emotion, 0 /*unique stream ID*/, 1 /*Model ID */);     

std::cout << "Connected stream \n\n\n";

accl.start();
//accl.wait();
while(runflag.load()){
    std::this_thread::sleep_for(std::chrono::milliseconds(2));
}  
cv::destroyAllWindows();
accl.stop();

The run Inference() function opens the video capture based on input provided and connects the input stream to the accelartor.

The main() function takes in an argument to determine the inoput to CV video capture and calls the runInference() function

Note

The runInference() function and main() function are defined in the multimodel.cpp code provided below.

Python

accl = AsyncAccl(dfp)
accl.connect_input(app.generate_frame_face)
accl.connect_input(app.generate_frame_emotion, 1)
accl.connect_output(app.process_face)
accl.connect_output(app.process_emotion, 1)
accl.wait()

The accelerator will automatically call the connected input and output functions in a fully pipelined fashion.

Third-Party Licenses#

This tutorial uses third-party software, models, and libraries. Below are the details of the licenses for these dependencies:

Model 1: face_detection_short_range.tflite
- License: MIT
Model 2: mobilenet_7.h5
- License: Apache License 2.0
Code and Pre/Post-Processing: Some code components, including pre/post-processing, were sourced from their GitHub: face_detection_short_range.tflite and mobilenet_7.h5
- License: MIT
- License: Apache License 2.0

Summary#

This tutorial showed how to use a Accelerator API to run a real-time inference using a Multi-Model use case. The full code and the compiled DFP are available for download.

C++

MultiModel_C++.zip

Python