Classification using Driver APIs#
Introduction#
In this tutorial, we give an example and detailed walkthrough to run inference using DFP file compiled using the Neural Compiler.
Note
OpenCV is used in the following example but is not included in the package. Please refer to OpenCV official website for more information.
The following example demonstrates how to use model MobileNet with one MemryX accelerator MX3:Cascade to run inference step-by-step. See Driver Usage for more advanced usages.
Basic Inference#
1. Open Device#
  // 1. Bind MPU device group 0 as MX3:Cascade to model 0.
  memx_status status = memx_open(model_id, group_id, MEMX_DEVICE_CASCADE);
  printf(" 1. memx_open = %d\n", status);
  # 1. Bind MPU device group 0 as MX3:Cascade to model 0.
  err = mxa.open(model_id, group_id, 3) # 3 = MX3:Cascade
  print(" 1. mxa.open =", err)
Before we can configure a model to our device, we first need to call open() to set up the driver in the very beginning. In this case, we are trying to bind model model_id = 0 to device group_id = 0 as a MemryX device MX3:Cascade. After calling open(), the driver internally creates a new model context which is currently blank, and sets up the interface to communicate with device 0 using MX3:Cascade library.
The binding relationship of model 0 and device 0 indicates all actions, including both configuration and inference, using model 0 will be applied to device 0. However, this relationship can still be changed through reconfigure() later to bind model 0 to a different device, for example, device 1, in the runtime.
2. Download Model#
  // 2. Download model within DFP file to MPU device group, input and
  // output feature map shape is auto. configured after download complete.
  if (memx_status_no_error(status)) {
    status = memx_download_model(model_id,
      "models/mobilenet_v1.dfp", 0, // model_idx = 0
      MEMX_DOWNLOAD_TYPE_WTMEM_AND_MODEL);
  }
  printf(" 2. memx_download_model = %d\n", status);
  # 2. Download model within DFP file to MPU device group, input and
  # output feature map shape is auto. configured after download complete.
  if not err:
    err = mxa.download(model_id,
      r"models/mobilenet_v1.dfp", 0, # model_idx = 0
      mxa.download_type_wtmem_and_model)
  print(" 2. mxa.download =", err)
After the interface to device is set up, it is time to configure our model to driver and device. Use the DFP file generated by the MemryX compiler (which should be named with an extension .dfp by default) along with download()* to configure the model to the driver in order to allocate both software and hardware resources.
Note
Weight memory and model can be configured separately in case multiple models share the same weight memory setting, while model_idx is used to indicate which model in the DFP file should be downloaded. However, we are not going through it for now.
3. Enable Data Streaming#
  // 3. Enable data transfer of this model to device. Set to no wait here
  // since driver will go to data transfer state eventually.
  if (memx_status_no_error(status)) {
    status = memx_set_stream_enable(model_id, 0);
  }
  printf(" 3. memx_set_stream_enable = %d\n", status);
  # 3. Enable data transfer of this model to device. Set to no wait here
  # since driver will go to data transfer state eventually.
  if not err:
    err = mxa.set_stream_enable(model_id, 0)
  print(" 3. mxa.set_stream_enable =", err)
Before starting the inference, one last thing is to enable the driver to write streaming data of specified models (input and output feature maps) through the interface. By default, models are not allowed to write and read streaming data through the interface to avoid unexpectedly taking data from others. Enable streaming of specific model only if this model is configured to the device and is about to run inference. After streaming is enabled, we are now ready to run inference.
4. Pre-process Input Feature Map#
Note
The code will need a test image to work on, you can download this sample image and save as image.png.
  // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  // 4. maybe put some input feature map pre-processing here
  // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  if (memx_status_no_error(status)) {
    Mat img = imread("image.png", IMREAD_COLOR);
    cv::resize(img, img, cv::Size(224,224), 0, 0, CV_INTER_LINEAR);
    img.convertTo(img, CV_32F, 1.0/127.5, -1);
    ifmap = (float*)img.data;
  }
  printf(" 4. pre-processing\n");
  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  # 4. maybe put some input feature map pre-processing here
  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  if not err:
    img = cv2.imread(r"image.png")
    img = cv2.resize(img, (224,224), interpolation=cv2.INTER_LINEAR)
    ifmap = img.astype(np.float32) / 127.5 - 1 # type(ifmap) = <class 'numpy.ndarray'>
  print(" 4. pre-processing")
In this example, we use OpenCV to load a single image into a buffer and do a pre-processing to resize and convert RGB data from an integer within range 0 ~ 255 to a floating-point within range -1 ~ 1. Preprocessing can be added anywhere if required. Modify it based on your model’s needs.
5. Stream Input Feature Map#
  // 5. Stream input feature map to device flow 0 and run inference.
  if (memx_status_no_error(status)) {
    status = memx_stream_ifmap(model_id, 0, ifmap, timeout);
  }
  printf(" 5. memx_stream_ifmap = %d\n", status);
  # 5. Stream input feature map to device flow 0 and run inference.
  if not err:
    err = mxa.stream_ifmap(model_id, 0, ifmap, timeout=200) # 200 ms
  print(" 5. mxa.stream_ifmap =", err)
Assumes that only one flow 0 input feature map is required to run inference. The driver first enqueues the given feature map by copying it into the input queue. Later after some interface-related formatting, data will be sent to the target device through the interface. In that case, the time stream_ifmap() returns is the time when the given input feature map is enqueued instead of the time the input feature map is sent to the device.
Note
Because of the hardware back pressure flow control mechanism, it is possible that the input feature map is queued within the driver and has not been sent to the device if there is no more space available on the device.
6. Stream Output Feature Map#
  // 6. Stream output feature map from device flow 0 after inference.
  if (memx_status_no_error(status)) {
    status = memx_stream_ofmap(model_id, 0, ofmap, timeout);
  }
  printf(" 6. memx_stream_ofmap = %d\n", status);
  # 6. Stream output feature map from device flow 0 after inference.
  if not err:
    err = mxa.stream_ofmap(model_id, 0, ofmap, timeout=200) # 200 ms
  print(" 6. mxa.stream_ofmap =", err)
After input feature map is written to device, we can now wait for output feature map to be read from device. Driver internally keeps polling output feature map from device in the background and stores it to output queue after some interface related formatting. In that case, the time stream_ofmap() returns is actually the time when output feature map is dequeued instead of the time output feature map is actually received from device.
Note
Most of the inference errors come from here as inference timeout or process hangs if timeout is set to 0 as infinite wait. In case output feature map cannot be read from device, please check model input and output feature map configuration and data, or ask MemryX team for help.
7. Post-process Output Feature Map#
  // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  // 7. maybe put some output feature map post-processing here
  // ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  if (memx_status_no_error(status)) {
    for (int i = 1; i < 1000; ++i) {
      if (ofmap[argmax] < ofmap[i]) {
        argmax = i;
      }
    }
  }
  printf(" 7. post-processing, argmax = %d\n", argmax);
  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  # 7. maybe put some output feature map post-processing here
  # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  if not err:
    ofmap = ofmap.reshape(1000) # reshape to single dimension
    argmax = np.argmax(ofmap)
  print(" 7. post-processing, argmax =", argmax)
It is time to handle the inference output feature maps. In this example, we are not going through real MobileNet post-processing but simply iterating over each score to get the class index with the maximum score as an example.
8. Close Device#
  // 8. Always remeber to clean-up resources before leaving.
  memx_close(model_id);
  printf(" 8. memx_close = %d\n", status);
  # 8. Always remeber to clean-up resources before leaving.
  mxa.close(model_id)
  print(" 8. mxa.close =", err)
Finally, always use close() to clean up resources allocated after there is no more data for the model to run inference.
Note
It is optional to use set_stream_disable() before close() for it is already included within clean-up process. However, in the model-swapping case, it is important to ensure streaming is disabled before switching to another model and running hardware re-configuration.
9. Execution Result#
$ mkdir build && cd build && cmake .. && make && cd ..
$ sudo ./build/memx_c_example1
 1. memx_open = 0
 2. memx_download_model = 0
 3. memx_set_stream_enable = 0
 4. pre-processing
 5. memx_stream_ifmap = 0
 6. memx_stream_ofmap = 0
 7. post-processing, argmax = 92
 8. memx_close = 0
success.
$ sudo python3 memx_py_example1.py
 1. mxa.open = 0
 2. mxa.download = 0
 3. mxa.set_stream_enable = 0
 4. pre-processing
 5. mxa.stream_ifmap = 0
 6. mxa.stream_ofmap = 0
 7. post-processing, argmax = 92
 8. mxa.close = 0
success.
The output message on the terminal shows that the highest score class index is 92 which is the label of bee_eater based on the dataset used to train this model and is the exact result that we expected.
Note
In this example, we use CMake to build source code. Make sure the static paths are given in your source code if runtime failure happens and modify them to fit your environment setup if you want to try this example.
Resources#
You can download the complete code examples above from the following links.
See also
 
    