Classification using Accelerator#

Introduction#

We will use the Neural Compiler and the MXA to classify real images with open source pre-trained models. In this tutorial we will use a ResNet50 model to classify some images from the imagenet dataset.

Note

This tutorial assumes a four chip solution is correctly connected.

Download and Compile the Model#

Use the keras API to download a pre-trained ResNet50 model and then compile to a .dfp with the Neural Compiler. From the command line run the following:

python3 -c "import tensorflow as tf; tf.keras.applications.ResNet50().save('resnet.h5');"
mx_nc -v -m resnet.h5 -c 4

In the current working directory, you should now have resnet.h5 (keras) and resnet.dfp (memryx) files. The .dfp file is the static compiled model which will be loaded onto the accelerator for inferencing.

Note

You can download the file resnet.h5 directly from this link resnet.h5.

Download 100 images of ImageNet#

The following code will need some test images to work with. We provide 100 images from the imagenet 2012 validation dataset with ground truth labels. Download imagenet100 and untar using the command below.

tar -xvzf imagenet100.tar.gz

Image loading + Preprocessing#

Python

Now, let’s write some python code to load the images into a numpy arrays and preprocess (resize + rescale). This will prepare them for inferencing. In a python file add the following lines:

import os, glob
from PIL import Image

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
import tensorflow as tf
from tensorflow import keras
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

import numpy as np
from memryx import NeuralCompiler, AsyncAccl, SyncAccl
import time 

# Make sure to configure the path correctly to point to the downloaded dataset
imagenet_path = 'imagenet100'

# Load ground_truth
with open(imagenet_path+'/ground_truth', 'r') as f:
    ground_truth = f.read().split('\n')[:-1]

# Load images
image_paths = glob.glob(imagenet_path+'/*.JPEG')
image_paths.sort()

images = []
for image_path in image_paths:
    image = np.array(Image.open(image_path).resize((224,224)))
    # Handle grey-scale images
    if image.shape == (224,224):
        image = np.repeat(image[:,:,np.newaxis], 3, axis=2)
    # Preprocessing (shift+scale)
    image = keras.applications.resnet.preprocess_input(image)
    images.append(image.astype(np.float32))

# Prepare the images for the CPU 
images = np.stack(images)

Note

We use the Pillow library to load the JPEG images. You can install this library by running pip install pillow on the command line.

C++

Note

In C++, we can run inference using two approaches:

User threading: Here, users can decide whether to run the application synchronously or create multiple threads to make it work asynchronously.
Auto threading: In this approach, the API handles the threading, allowing it to work asynchronously and making it simpler for users not to worry about threading.

The image loading, preprocessing, and post-processing of predictions code are the same for both approaches.

Here, we parse images from the ImageNet folder using the filesystem to iterate over them. The images are then preprocessed (resized and rescaled) to prepare them for inferencing.

In the current working directory, you should now have the imagenet100 folder, with image names starting with ILSVRC2012_val_

// Imagenet folder
const fs::path imagenetPath = "imagenet100/";
std::string baseFilename = "ILSVRC2012_val_";
int counter = 100;
    // Allocate memory for each output feature map and store pointers in the output vector
    for(int j=0; j<model_info.num_out_featuremaps; ++j){
        float * fmap = new float[model_info.out_featuremap_sizes[j]];
        ofmap.push_back(fmap);
    }

    for(int i=1; i <= counter; i++){
        
        inframe = cv::imread(fullPath.string());
        if(!inframe.empty()){

            cv::Mat preProcframe = preprocessImage(inframe);

Preprocessing code:

cv::Mat preprocessImage(cv::Mat& img) {
    
    cv::resize(img, img, cv::Size(224, 224));

    if (img.channels() == 1) {
        cv::cvtColor(img, img, cv::COLOR_GRAY2BGR);
    }

    cv::Mat img_float;
    img.convertTo(img_float, CV_32F);

    return img_float; 
}

Run inference#

Python

Next, lets run inference on both the CPU and MXA so we can compare the results!

Async API (Pipelined)

The Async API is the straightforward way to best utilize the MX3. You need only to connect an input and an output function, and the API will handle the threading and the data streaming under the hood.

def run_async_accl():

    # AsyncAccl
    img_iter = iter(np.expand_dims(img, 0) for img in images)
    mxa_outputs = []

    def get_frame():
        return next(img_iter, None)

    def process_output(*outputs):
        mxa_outputs.append(np.squeeze(outputs[0], 0))

    accl = AsyncAccl(dfp='resnet.dfp')
    start = time.time()
    accl.connect_input(get_frame)
    accl.connect_output(process_output)
    accl.wait()

    mxa_inference_time = time.time() - start
    mxa_outputs = np.stack([np.squeeze(arr) for arr in mxa_outputs])

    return mxa_outputs, mxa_inference_time
mxa_outputs, mxa_inference_time = run_async_accl()
cpu_preds = keras.applications.mobilenet.decode_predictions(cpu_outputs, top=5)
mxa_preds = keras.applications.mobilenet.decode_predictions(mxa_outputs, top=5)

print("CPU Inference time (100 images): {:.1f} msec".format(cpu_inference_time*1000))
print("MXA Inference time (100 images): {:.1f} msec".format(mxa_inference_time*1000))

Sync API (Batch Run)

The Sync API is a simpler API for the user to supply a group of frames, and it will internally stream them to the accelerator. This option is meant only for offline (not live) processing of data. PS: Uncomment the line mxa_outputs, mxa_inference_time = run_sync_accl() before executing the code.

def run_sync_accl():

    accl = SyncAccl(dfp='resnet.dfp')
    start = time.time()
    mxa_outputs = accl.run(images)
    mxa_inference_time = time.time() - start
    mxa_outputs = np.stack([np.squeeze(arr) for arr in mxa_outputs])

    return mxa_outputs, mxa_inference_time
# mxa_outputs, mxa_inference_time = run_sync_accl()

cpu_preds = keras.applications.mobilenet.decode_predictions(cpu_outputs, top=5)
mxa_preds = keras.applications.mobilenet.decode_predictions(mxa_outputs, top=5)

print("CPU Inference time (100 images): {:.1f} msec".format(cpu_inference_time*1000))
print("MXA Inference time (100 images): {:.1f} msec".format(mxa_inference_time*1000))

C++

Let’s run the inference in the C++ code by passing the image into the MXA and getting the output result.

User Threading (Pipelined)

In the MX API user threading mode, we have written code such that the input and output occur sequentially, i.e., synchronously. We need to pass the input and receive the output through the MXA using the functions send_input and receive_output.

    // accl to connect to accelerator
    MX::Runtime::MxAcclMT* accl = new MX::Runtime::MxAcclMT(modelPath);

    int stream_label=0;

    model_info = accl->get_model_info(0);

    std::cout << "Connected stream \n\n\n";

    //  a vector to hold input data pointers and reserve space based on the number of input feature maps
    std::vector<float*> input_data;
    input_data.reserve(model_info.num_in_featuremaps);

    // a vector to hold output feature map data pointers
    std::vector<float*> ofmap;

    // Reserve space in the output vector based on the number of output feature maps
    ofmap.reserve(model_info.num_out_featuremaps);

    // Allocate memory for each output feature map and store pointers in the output vector
    for(int j=0; j<model_info.num_out_featuremaps; ++j){
        float * fmap = new float[model_info.out_featuremap_sizes[j]];
        ofmap.push_back(fmap);
    }

    for(int i=1; i <= counter; i++){
        
        cv::Mat inframe;
        std::stringstream ss;
        ss << baseFilename << std::setw(8) << std::setfill('0') << i << ".JPEG";
        fs::path fullPath = imagenetPath / ss.str();
        inframe = cv::imread(fullPath.string());

        int recvinf_label;

        if(!inframe.empty()){

            cv::Mat preProcframe = preprocessImage(inframe);

            // Add the preprocessed frame data as an input feature map
            input_data.push_back((float*)preProcframe.data);
            
            // Send the input data to the accelerator
            accl->send_input(input_data, model_info.model_index, stream_label, 0);

            // Receive the processed output from the accelerator
            accl->receive_output(ofmap, model_info.model_index, recvinf_label, 0);

            // Evaluate the output and print the scores
            printScore(ofmap);

            // Clear the input data vector for reuse
            input_data.clear();
        }
    }

Note

In user threading mode, users can decide whether they want to make it synchronous or asynchronous.

In the above code, the printScore function is used to process the output and get the top 1 and top 5 predictions. The function is declared below in the prediction section.

Auto Threading (Pipelined)

The MX_API auto threading mode enables the user to connect input and output callback functions to send and receive data from the accelerator. The API will handle the threading and data streaming internally.

Input-callback function: Each frame is passed to the MXA.

bool incallback_getframe(vector<const MX::Types::FeatureMap*> dst, int streamLabel){

    if(runflag.load()){

        bool got_frame = false;
        cv::Mat inframe;

        if(counter > 100) {
            std::cout << "\n\nEnd of video/cam/img \n\n\n";
            runflag.store(false);
            return false; 
        }

        std::stringstream ss;
        ss << baseFilename << std::setw(8) << std::setfill('0') << counter << ".JPEG";
        fs::path fullPath = imagenetPath / ss.str();
        inframe = cv::imread(fullPath.string());
        counter++;
        
        if(!inframe.empty()){
            got_frame = true;
        }

        if (!got_frame) {
            std::cout << "\n\n End of video/cam/img \n\n\n";
            runflag.store(false);
            return false;  // return false if frame retrieval fails
        }

        else{
            // Preprocess frame
            cv::Mat preProcframe = preprocessImage(inframe);

            // Set preprocessed input data to be sent to accelarator
            dst[0]->set_data((float*)preProcframe.data);

            return true;
        }           
    }
    else
    {
        runflag.store(false);
        return false;
    }    
}

Output-callback function: The output received from the accelerator.

bool outcallback_getmxaoutput(vector<const MX::Types::FeatureMap*> src, int streamLabel){

    std::vector<float*> ofmap;
    
    ofmap.reserve(src.size());
    
    for(int i; i<model_info.num_out_featuremaps ; ++i){
        float * fmap = new float[model_info.out_featuremap_sizes[i]];
        src[i]->get_data(fmap);
        ofmap.push_back(fmap);
    }

    printScore(ofmap);

    return true;
}

Note

In the above code, the printScore function is used to process the output and get the top 1 and top 5 predictions. The function is declared below in the prediction section.

Connect the Accelerator: connect our input and output functions to the AsyncAccl API. The API will take care of the rest.

 MX::Runtime::MxAccl* accl = new MX::Runtime::MxAccl(modelPath);

  model_info = accl->get_model_info(0);

  accl->connect_stream(&incallback_getframe, &outcallback_getmxaoutput, 10 /*unique stream ID*/, 0 /*Model ID */);
  std::cout << "Connected stream \n\n\n";

  accl->start();

  while(runflag.load()){
      std::this_thread::sleep_for(std::chrono::milliseconds(2));
  }  
  accl->stop();

Predictions#

Python

Finally, let’s compare the prediction results to the ground truth.

def compare_with_ground_truth(predictions):
    top1, top5, total = 0, 0, len(predictions)
    for i,pred in enumerate(predictions):
        gt = ground_truth[i]
    
        classes = [guess[0] for guess in pred]
        if gt in classes:
            top5 += 1
        if gt == classes[0]:
            top1 += 1
    
    print("Top 1: ({}/{})  {:.2f} % ".format(top1, total, top1/total*100))
    print("Top 5: ({}/{})  {:.2f} % ".format(top5, total, top5/total*100))

print("CPU Results: ")
compare_with_ground_truth(cpu_preds)

print("MXA Results: ")
compare_with_ground_truth(mxa_preds)

C++

In C++, we have defined the ground truth labels in the vector for user simplicity across 100 images.

std::vector<int> ground_truth  = {  65,970,230,809,516,57,334,415,674,332,109,286,370,757,595,147,473,23,478,517,334,173,
                                    948,727,23,846,270,167,55,858,324,573,150,981,586,887,32,398,777,74,516,756,129,198,
                                    256,725,565,167,717,394,92,29,844,591,358,468,259,994,872,588,474,183,107,46,842,390,
                                    101,887,870,841,467,149,21,476,80,424,159,275,175,461,970,160,788,58,479,498,369,28,487,
                                    50,270,383,366,780,373,705,330,142,949,349 };

Now, let’s process the output from the MXA and compare the prediction results with the ground truth.

void printScore(const std::vector<float*>& ofmaps){
    
    totalSamples += 1; 
    std::vector<int> indices(1000);  
    
    std::iota(indices.begin(), indices.end(), 0);  // Fill the indices vector with consecutive integers starting at 0

    std::sort(indices.begin(), indices.end(),  // Sort the indices based on the comparison of values in ofmaps
        [&ofmaps](int i1, int i2) {
            return ofmaps[0][i1] > ofmaps[0][i2];
        }
    );

    int trueIndex = ground_truth[totalSamples - 1]; // Retrieve the ground truth index for the current sample

    // Check if the top prediction (highest probability) matches the true index (top-1 accuracy)
    if (indices[0] == trueIndex) {  
        correctTop1++;
    }

    // Check top-5 accuracy: see if the true index is among the top 5 predictions
    if (std::find(indices.begin(), indices.begin() + 5, trueIndex) != indices.begin() + 5) {   
        correctTop5++;
    }
}

In the main function, we can calculate the Top-1 and Top-5 score.

// Print accuracy
std::cout << "Top-1 Accuracy: " << static_cast<double>(correctTop1) / totalSamples * 100.0 << "%" << std::endl;
std::cout << "Top-5 Accuracy: " << static_cast<double>(correctTop5) / totalSamples * 100.0 << "%" << std::endl;

Third-Party License#

This tutorial uses third-party models available through the Keras Applications API. Below are the details of the licenses for these dependencies:

Models: Models sourced from the Keras Applications API
- License: Apache License 2.0

Summary#

This tutorial outlined how to use a pre-trained model to run inference using the Accelerator API. The full script is available for download:

Python

imagenet.py

C++

imagenet_c++_user_threading.zip

imagenet_c++_auto_threading.zip