Classification using Simulator#

Introduction#

We will use the Neural Compiler and Simulator API to make a simple classification demo which will allow us to classify real images with open source pre-trained models, run on our bit-accurate hardware simulator. In this demo we will use a Mobilenet model because it is light-weight and (relatively) fast to simulate.

Download & Run

Download

The instructions and code snippets in this tutorial are intended to provide an overview of the topic. Download the full code implementation and then refer to the Run section for detailed instructions on how to execute the code.

sample image and save as image.png.

classify_keras.py

classify_tf.py

classify_tf.py

Run

python classify_keras.py
python classify_keras.py
python classify_keras.py

See the sections below for a step-by-step detailed explanation.

1. Download Model#

In your Python code, use the built-in API or web to download pretrained models.

from tensorflow import keras
model = keras.applications.MobileNet()
import tensorflow as tf
URL = "http://download.tensorflow.org/models/mobilenet_v1_2018_02_22"
TARGET = "mobilenet_v1_1.0_224"
os.system("wget  {}/{}.tgz".format(URL, TARGET))
os.system("tar -xvf {}.tgz".format(TARGET))
import torch
model = torch.hub.load('pytorch/vision:v0.10.0', 'mobilenet_v2', pretrained=True)
model.eval()
# Convert to Onnx
sample_input = torch.randn(1, 3, 224, 224)
torch.onnx.export(
    model,                     # The model to be exported
    sample_input,              # The sample input tensor
    "mobilenet_v2.onnx",       # The output file name
    export_params=True,        # Store the trained parameter weights inside the model file
    opset_version=17,          # The ONNX version to export the model to
    do_constant_folding=True,  # Whether to execute constant folding for optimization
    input_names=['input'],     # The model's input names
    output_names=['output'],   # The model's output names
)

2. Compile#

Use the MemryX NeuralCompiler to compile the downloaded model into a DataFlowProgram (DFP).

from memryx import NeuralCompiler, Simulator
dfp = NeuralCompiler(models=model, verbose=1).run()
from memryx import NeuralCompiler, Simulator
dfp = NeuralCompiler(models=TARGET+"_frozen.pb", verbose=1).run()
from memryx import NeuralCompiler, Simulator
dfp = NeuralCompiler(models='mobilenet_v2.onnx', verbose=1).run()

Output:

Loading model: (Done)
Graph processing: (Done)
Initial flow optimization: (1) (Done)
Cores optimzation: (52) (Done)
Flow optimzation: (6) (Done)
Assembling DFP for Cascade: (Done)

3. Prepare the Image#

Preprocess the image to prepare it for inferencing. This generally means cropping, reshaping, and/or normalizing the image.

Note

The code will need a test image to work on, you can download this sample image and save as image.png.

from PIL import Image
import numpy as np
image = Image.open('image.png').resize((224,224))
image = keras.applications.mobilenet.preprocess_input(np.array(image))
image = np.expand_dims(image, axis=0)
from PIL import Image
import numpy as np
# Prepare image
image = Image.open('image.png').resize((224,224))
image = np.array(image)
image = image / np.max(image) * 2 - 1
image = np.expand_dims(image, axis=0)
from PIL import Image
from torchvision import transforms
import numpy as np
input_image = Image.open('image.png')
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
image = input_tensor.unsqueeze(0).numpy()

Run inference#

Use the bit-accurate hardware simulator to run inference on the image using the compiled DFP.

s = Simulator(dfp=dfp, verbose=1)
outputs = s.infer(inputs=image)
latency, fps = s.benchmark(frames=4)
# Run Simulator
s = Simulator(dfp=dfp, verbose=1)
outputs = s.infer(inputs=image)
latency, fps = s.benchmark(frames=4)
# Run Simulator
s = Simulator(dfp=dfp, verbose=1)
outputs = s.infer(inputs=image)
latency, fps = s.benchmark(frames=4)

Output:

print("Simulated MXA FPS: ", fps)
print("Simulated MXA Latency: ", latency)
Simulated MXA FPS:  2458.7846227609693
Simulated MXA Latency:  1.40384
print("Simulated MXA FPS: ", fps)
print("Simulated MXA Latency: ", latency)
Simulated MXA FPS:  2458.7846227609693
Simulated MXA Latency:  1.40384
print("Simulated MXA FPS: ", fps)
print("Simulated MXA Latency: ", latency)
Simulated MXA FPS:  2474.114576246026
Simulated MXA Latency:  1.2423783333333334

Predictions#

Finally, convert the neural network output to a prediction result.

outputs = np.expand_dims(outputs[0], 0)
predictions = keras.applications.mobilenet.decode_predictions(outputs, top=1)[0][0]
print("I see a '{}' with {:.1f} % certainty".format(predictions[1], predictions[2]*100))
outputs = np.expand_dims(outputs[0], 0)
outputs = outputs[:, 1:]
predictions = tf.keras.applications.mobilenet.decode_predictions(outputs, top=1)[0][0]
print("I see a '{}' with {:.1f} % certainty".format(predictions[1], predictions[2]*100))
def softmax(x): 
    return np.exp(x - np.max(x)) / np.sum(np.exp(x - np.max(x)))

outputs = softmax(outputs)
outputs = np.squeeze(outputs)
idx = np.argmax(outputs)
print("I see a '{}' with {:.1f} % certainty".format(classes[idx], outputs[idx]*100))

Output:

I see a 'bell_pepper' with 98.1 % certainty

Third-Party License#

This tutorial uses third-party models available through the Keras Applications API. Below are the details of the licenses for these dependencies:

Summary#

This tutorial outlined how to use pre-trained models to run inference using the API’s provided in the NeuralCompiler and Simulator. The full scripts are available for download:

classify_keras.py

classify_tf.py

classify_pt.py