Classification using Accelerator#

Introduction#

We will use the Neural Compiler and the MXA to classify real images with open source pre-trained models. In this tutorial we will use a ResNet50 model to classify some images from the imagenet dataset.

Note

This tutorial assumes a four chip solution is correctly connected.

Download and Compile the Model#

Use the keras API to download a pre-trained ResNet50 model and then compile to a .dfp with the Neural Compiler. From the command line run the following:

python3 -c "import tensorflow as tf; tf.keras.applications.ResNet50().save('resnet.h5');"
mx_nc -v -m resnet.h5 -c 4

In the current working directory, you should now have resnet.h5 (keras) and resnet.dfp (memryx) files. The .dfp file is the static compiled model which will be loaded onto the accelerator for inferencing.

Note

You can download the file resnet.h5 directly from this link resnet.h5.

Download 100 images of ImageNet#

The following code will need some test images to work with. We provide 100 images from the imagenet 2012 validation dataset with ground truth labels. Download imagenet100 and untar.

tar -xvzf imagenet100.tar.gz

Image loading + Preprocessing#

Now, let’s write some python code to load the images into a numpy arrays and preprocess (resize + rescale). This will prepare them for inferencing. In a python file add the following lines:

import os, glob
from PIL import Image
import numpy as np
from tensorflow import keras

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

# Make sure to configure the path correctly to point to the downloaded dataset
imagenet_path = 'imagenet100'

# Load ground_truth
with open(imagenet_path+'/ground_truth', 'r') as f:
    ground_truth = f.read().split('\n')[:-1]

# Load images
image_paths = glob.glob(imagenet_path+'/*.JPEG')
image_paths.sort()

images = []
for image_path in image_paths:
    image = np.array(Image.open(image_path).resize((224,224)))
    # Handle grey-scale images
    if image.shape == (224,224):
        image = np.repeat(image[:,:,np.newaxis], 3, axis=2)
    # Preprocessing (shift+scale)
    image = keras.applications.resnet.preprocess_input(image)
    images.append(image.astype(np.float32))

Note

We use the Pillow library to load the JPEG images. You can install this library by running pip install pillow on the command line.

Note

In C++, we can run inference using two approaches:

  1. User threading: Here, users can decide whether to run the application synchronously or create multiple threads to make it work asynchronously.

  2. Auto threading: In this approach, the API handles the threading, allowing it to work asynchronously and making it simpler for users not to worry about threading.

The image loading, preprocessing, and post-processing of predictions code are the same for both approaches.

Here, we parse images from the ImageNet folder using the filesystem to iterate over them. The images are then preprocessed (resized and rescaled) to prepare them for inferencing.

In the current working directory, you should now have the imagenet100 folder, with image names starting with ILSVRC2012_val_

// Imagenet folder
const fs::path imagenetPath = "imagenet100/";
std::string baseFilename = "ILSVRC2012_val_";
int counter = 100;
    // Allocate memory for each output feature map and store pointers in the output vector
    for(int j=0; j<model_info.num_out_featuremaps; ++j){
        float * fmap = new float[model_info.out_featuremap_sizes[j]];
        ofmap.push_back(fmap);
    }

    for(int i=1; i <= counter; i++){
        
        inframe = cv::imread(fullPath.string());

Preprocessing code:

cv::Mat preprocessImage(cv::Mat& img) {
    
    cv::resize(img, img, cv::Size(224, 224));

    if (img.channels() == 1) {
        cv::cvtColor(img, img, cv::COLOR_GRAY2BGR);
    }

    cv::Mat img_float;
    img.convertTo(img_float, CV_32F);

    return img_float; 
}

Run inference#

Next, lets run inference on both the CPU and MXA so we can compare the results!

The Async API is the straightforward way to best utilize the MX3. You need only to connect an input and an output function, and the API will handle the threading and the data streaming under the hood.

import time
from memryx import AsyncAccl

img_iter = iter(images)
mxa_outputs = []

def get_frame():
    return next(img_iter, None)

def process_output(*outputs):
    mxa_outputs.append(np.squeeze(outputs[0], 0))

# CPU run
model = keras.models.load_model('resnet.h5')
start = time.time()
cpu_outputs = model.predict(np.array(images))
cpu_inference_time = time.time() - start

# MXA run
accl = AsyncAccl(dfp='resnet.dfp')
start = time.time()
accl.connect_input(get_frame)
accl.connect_output(process_output)
accl.wait()

mxa_outputs = np.stack([np.squeeze(arr) for arr in mxa_outputs])
mxa_inference_time = time.time() - start

cpu_preds = keras.applications.mobilenet.decode_predictions(cpu_outputs, top=5)
mxa_preds = keras.applications.mobilenet.decode_predictions(mxa_outputs, top=5)

print("CPU Inference time (100 images): {:.1f} msec".format(cpu_inference_time*1000))
print("MXA Inference time (100 images): {:.1f} msec".format(mxa_inference_time*1000))

The Sync API is a simpler API for the user to supply a group of frames, and it will internally stream them to the accelerator. This option is meant only for offline (not live) processing of data.

import time
from memryx import SyncAccl

# CPU run
model = keras.models.load_model('resnet.h5')
start = time.time()
cpu_outputs = model.predict(np.array(images))
cpu_inference_time = time.time() - start

# MXA run
accl = SyncAccl(dfp='resnet.dfp')
start = time.time()
mxa_outputs = accl.run(images)
mxa_outputs = np.stack([np.squeeze(arr) for arr in mxa_outputs])
mxa_inference_time = time.time() - start

cpu_preds = keras.applications.mobilenet.decode_predictions(cpu_outputs, top=5)
mxa_preds = keras.applications.mobilenet.decode_predictions(mxa_outputs, top=5)

print("CPU Inference time (100 images): {:.1f} msec".format(cpu_inference_time*1000))
print("MXA Inference time (100 images): {:.1f} msec".format(mxa_inference_time*1000))

Let’s run the inference in the C++ code by passing the image into the MXA and getting the output result.

In the MX API user threading mode, we have written code such that the input and output occur sequentially, i.e., synchronously. We need to pass the input and receive the output through the MXA using the functions send_input and receive_output.

    // accl to connect to accelerator
    MX::Runtime::MxAcclMT accl;

    // accl = new MX::Runtime::MxAcclMT;
    int dfp_tag = accl.connect_dfp(modelPath.c_str()); 

    int stream_label=0;

    model_info = accl.get_model_info(0);

    std::cout << "Connected stream \n\n\n";

    //  a vector to hold input data pointers and reserve space based on the number of input feature maps
    std::vector<float*> input_data;
    input_data.reserve(model_info.num_in_featuremaps);

    // a vector to hold output feature map data pointers
    std::vector<float*> ofmap;

    // Reserve space in the output vector based on the number of output feature maps
    ofmap.reserve(model_info.num_out_featuremaps);

    // Allocate memory for each output feature map and store pointers in the output vector
    for(int j=0; j<model_info.num_out_featuremaps; ++j){
        float * fmap = new float[model_info.out_featuremap_sizes[j]];
        ofmap.push_back(fmap);
    }

    for(int i=1; i <= counter; i++){
        
        cv::Mat inframe;
        std::stringstream ss;
        ss << baseFilename << std::setw(8) << std::setfill('0') << i << ".JPEG";
        fs::path fullPath = imagenetPath / ss.str();
        inframe = cv::imread(fullPath.string());

        int recvinf_label;

        if(!inframe.empty()){

            cv::Mat preProcframe = preprocessImage(inframe);

            // Add the preprocessed frame data as an input feature map
            input_data.push_back((float*)preProcframe.data);
            
            // Send the input data to the accelerator
            accl.send_input(input_data, model_info.model_index, stream_label,dfp_tag , false,0);

            // Receive the processed output from the accelerator
            accl.receive_output(ofmap, model_info.model_index, recvinf_label, dfp_tag ,true,0);

            // Evaluate the output and print the scores
            printScore(ofmap);

            // Clear the input data vector for reuse
            input_data.clear();
        }
    }

Note

In user threading mode, users can decide whether they want to make it synchronous or asynchronous.

In the above code, the printScore function is used to process the output and get the top 1 and top 5 predictions. The function is declared below in the prediction section.

The MX_API auto threading mode enables the user to connect input and output callback functions to send and receive data from the accelerator. The API will handle the threading and data streaming internally.

  1. Input-callback function: Each frame is passed to the MXA.

bool incallback_getframe(vector<const MX::Types::FeatureMap<float>*> dst, int streamLabel){

    if(runflag.load()){

        bool got_frame = false;
        cv::Mat inframe;

        std::stringstream ss;
        ss << baseFilename << std::setw(8) << std::setfill('0') << counter << ".JPEG";
        fs::path fullPath = imagenetPath / ss.str();
        inframe = cv::imread(fullPath.string());
        counter++;
        
        if(!inframe.empty()){
            got_frame = true;
        }

        if (!got_frame) {
            std::cout << "\n\n No frame - End of video/cam/img \n\n\n";
            runflag.store(false);
            return false;  // return false if frame retrieval fails
        }

        else{
            // Preprocess frame
            cv::Mat preProcframe = preprocessImage(inframe);

            // Set preprocessed input data to be sent to accelarator
            dst[0]->set_data((float*)preProcframe.data, false);

            return true;
        }           
    }
    else
    {
        runflag.store(false);
        return false;
    }    
}
  1. Output-callback function: The output received from the accelerator.

bool outcallback_getmxaoutput(vector<const MX::Types::FeatureMap<float>*> src, int streamLabel){

    std::vector<float*> ofmap;
    
    ofmap.reserve(src.size());
    
    for(int i; i<model_info.num_out_featuremaps ; ++i){
        float * fmap = new float[model_info.out_featuremap_sizes[i]];
        src[i]->get_data(fmap, true);
        ofmap.push_back(fmap);
    }

    printScore(ofmap);

    return true;
}

Note

In the above code, the printScore function is used to process the output and get the top 1 and top 5 predictions. The function is declared below in the prediction section.

  1. Connect the Accelerator: connect our input and output functions to the AsyncAccl API. The API will take care of the rest.

 MX::Runtime::MxAccl accl;
 accl.connect_dfp(modelPath.c_str()); // Connect the model to the accelerator
  model_info = accl.get_model_info(0);

  accl.connect_stream(&incallback_getframe, &outcallback_getmxaoutput, 10 /*unique stream ID*/, 0 /*Model ID */);
  std::cout << "Connected stream \n\n\n";

  accl.start();

  while(runflag.load()){
      std::this_thread::sleep_for(std::chrono::milliseconds(2));
  }  
  accl.stop();

Predictions#

Finally, let’s compare the prediction results to the ground truth.

def compare_with_ground_truth(predictions):
    top1, top5, total = 0, 0, len(predictions)
    for i,pred in enumerate(predictions):
        gt = ground_truth[i]

        classes = [guess[0] for guess in pred]
        if gt in classes:
            top5 += 1
        if gt == classes[0]:
            top1 += 1

    print("Top 1: ({}/{})  {:.2f} % ".format(top1, total, top1/total*100))
    print("Top 5: ({}/{})  {:.2f} % ".format(top5, total, top5/total*100))

print("CPU Results: ")
compare_with_ground_truth(cpu_preds)

print("MXA Results: ")
compare_with_ground_truth(mxa_preds)

In C++, we have defined the ground truth labels in the vector for user simplicity across 100 images.

std::vector<int> ground_truth  = {  65,970,230,809,516,57,334,415,674,332,109,286,370,757,595,147,473,23,478,517,334,173,
                                    948,727,23,846,270,167,55,858,324,573,150,981,586,887,32,398,777,74,516,756,129,198,
                                    256,725,565,167,717,394,92,29,844,591,358,468,259,994,872,588,474,183,107,46,842,390,
                                    101,887,870,841,467,149,21,476,80,424,159,275,175,461,970,160,788,58,479,498,369,28,487,
                                    50,270,383,366,780,373,705,330,142,949,349 };

Now, let’s process the output from the MXA and compare the prediction results with the ground truth.

void printScore(const std::vector<float*>& ofmaps){
    
    totalSamples += 1; 
    std::vector<int> indices(1000);  
    
    std::iota(indices.begin(), indices.end(), 0);  // Fill the indices vector with consecutive integers starting at 0

    std::sort(indices.begin(), indices.end(),  // Sort the indices based on the comparison of values in ofmaps
        [&ofmaps](int i1, int i2) {
            return ofmaps[0][i1] > ofmaps[0][i2];
        }
    );

    int trueIndex = ground_truth[totalSamples - 1]; // Retrieve the ground truth index for the current sample

    // Check if the top prediction (highest probability) matches the true index (top-1 accuracy)
    if (indices[0] == trueIndex) {  
        correctTop1++;
    }

    // Check top-5 accuracy: see if the true index is among the top 5 predictions
    if (std::find(indices.begin(), indices.begin() + 5, trueIndex) != indices.begin() + 5) {   
        correctTop5++;
    }
}

In the main function, we can calculate the Top-1 and Top-5 score.

    for (auto& fmap : ofmap) {
        delete[] fmap;
        fmap = NULL;

Third-Party License#

This tutorial uses third-party models available through the Keras Applications API. Below are the details of the licenses for these dependencies:

Summary#

This tutorial outlined how to use a pre-trained model to run inference using the Accelerator API. The full script is available for download: