Driver Usage#

Introduction#

In this tutorial, we will provide more examples to run inference with different models and devices combinations. Be sure to read Classification using Driver APIs for basic driver knowledge in advance.

Note

OpenCV is used in the following example but is not included in the package. Please refer to OpenCV official website for more information.

Before we look into following examples, it is important to first understand different types of DFP file. In some of the cases, runtime model-swapping are required to avoid hardware sources access conflict.

Examples are constructed as three parts: run inference sub-routine, run inference by model and application main. Descriptions are provided for each example while source codes can be downloaded from Resources.

Single model is compiled in one DFP file. This is the most common and basic scenario. We run only one model inference using one accelerator (no matter cascaded or not). No runtime model-swapping is required. See Basic Inference.
Multiple models are compiled in same DFP file and coexist. Two or more models are compiled with no hardware resources overlapped. In other words, these models can run inference simultaneously using the same accelerator with different input and output ports (flows) are given. No runtime model-swapping is required. See Example 2: Multiple Models Coexist.
Multiple models are compiled into different DFP files. Two or more models are compiled separatedly into multiple DFP files. Both weight memory re-configuration and model-swapping are required in the runtime. See Example 3: Runtime Model Swap

Another common strategy is to use multiple accelerators (no matter cascaded or not) to run inference in parallel to reduce overall inference time and latency.

Multiple models with runtime device selection. Dynamically select device which is available during runtime. This example can be viewed as an extension of Example 3: Runtime Model Swap. Here we demonstrate how to run two models on two accelerators separately with runtime device selection. See Example 4: Runtime Device Selection

Example 3: Runtime Model Swap#

3.1 Run Inference Sub-routine#

C/C++

// Since two models share the same procedure to run inference, we make it a
// common sub-routine with structure of parameters here.
typedef struct  {
  uint8_t model_id; // model ID
  uint8_t group_id; // MPU device group ID
  const char* dfp_path; // DFP file path
  uint8_t iport; // input port ID
  void* ifmap; // input feature map
  uint8_t oport; // output port ID
  void* ofmap; // output feature map
} RunInferenceConfig;

// In order to guarantee only one model can access device in multi-threading
// case, we acquire `device lock` from driver before any hardware configuration.
memx_status run_inference(RunInferenceConfig* config)
{
  const int timeout = 200; // 200 ms

  // 1. Get lock first before hardware configuration. Block and wait here
  // until lock is acquired. Remember to lock 'group_id' instead of
  // 'model_id' since we are trying to lock hardware resource.
  memx_lock(config->group_id);

  // 2. Download weight memory and model to device.
  memx_status status = memx_download_model(config->model_id,
    config->dfp_path, 0, MEMX_DOWNLOAD_TYPE_WTMEM_AND_MODEL);
  // 3. Enable data transfer of this model to device.
  if (memx_status_no_error(status)) {
    status = memx_set_stream_enable(config->model_id, 0);
  }
  // 4. Write input feature map to device to run inference
  if (memx_status_no_error(status)) {
    status = memx_stream_ifmap(config->model_id,
      config->iport, config->ifmap, timeout);
  }
  // 5. Read output feature map from device after inference
  if (memx_status_no_error(status)) {
    status = memx_stream_ofmap(config->model_id,
      config->oport, config->ofmap, timeout);
  }
  // 6. Disable data transfer of this model to device.
  if (memx_status_no_error(status)) {
    // wait to stop may take some time, but is safe
    status = memx_set_stream_disable(config->model_id, 1);
  }

  // 7. Always remember to release lock finally.
  memx_unlock(config->group_id);

  return status;
}

Python

class RunInferenceConfig:
  '''Since two models share the same procedure to run inference, we make it a
    common sub-routine with structure of parameters here.'''
  def __init__(self):
    self.model_id = 0 # model ID
    self.group_id = 0 # MPU device group ID
    self.dfp_path = None # DFP file path
    self.iport = 0 # input port ID
    self.ifmap = None # input feature map
    self.oport = 0 # output port ID
    self.ofmap = None # output feature map

def run_inference(config: RunInferenceConfig) -> int:
  '''In order to guarantee only one model can access device in multi-threading
    case, we acquire `device lock` from driver before any hardware
    configuration.'''
  timeout = 200 # 200 ms

  # 1. Get lock first before hardware configuration. Block and wait here
  # until lock is acquired. Remember to lock 'group_id' instead of
  # 'model_id' since we are trying to lock hardware resource.
  mxa.lock(config.group_id)

  # 2. Download weight memory and model to device.
  err = mxa.download(config.model_id, config.dfp_path, 0,
    mxa.download_type_wtmem_and_model)
  # 3. Enable data transfer of this model to device.
  if not err:
    err = mxa.set_stream_enable(config.model_id, 0)
  # 4. Write input feature map to device to run inference
  if not err:
    err = mxa.stream_ifmap(config.model_id,
      config.iport, config.ifmap, timeout=timeout)
  # 5. Read output feature map from device after inference
  if not err:
    err = mxa.stream_ofmap(config.model_id,
      config.oport, config.ofmap, timeout=timeout)
  # 6. Disable data transfer of this model to device.
  if not err:
    # wait to stop may take some time, but is safe
    err = mxa.set_stream_disable(config.model_id, 1)

  # 7. Always remember to release lock finally.
  mxa.unlock(config.group_id)

  return err

The most important part in run_inference() of this example, is the usage of lock() and unlock().

C/C++

  // 1. Get lock first before hardware configuration. Block and wait here
  // until lock is acquired. Remember to lock 'group_id' instead of
  // 'model_id' since we are trying to lock hardware resource.
  memx_lock(config->group_id);

  memx_unlock(config->group_id);

Python

  # 1. Get lock first before hardware configuration. Block and wait here
  # until lock is acquired. Remember to lock 'group_id' instead of
  # 'model_id' since we are trying to lock hardware resource.
  mxa.lock(config.group_id)

  # 7. Always remember to release lock finally.
  mxa.unlock(config.group_id)

Acquiring lock from driver help us to make sure there is always only one model can access device through interface in the same time. This helps to avoid models unexpectedly re-configure device when others are using device or take inference data from each other. After lock is acquired in this example, we can start to re-configure device by downloading weight memory and model then and run inference.

Note

Always remember to unlock in the end of sub-routine, otherwise other sub-routine might be blocked and wait forever. Also, be careful not to unlock others’s lock.

3.2 Run Inference By Model#

C/C++

  RunInferenceConfig config;
  config.model_id = MODEL_1_ID; // model 1
  config.group_id = GROUP_ID; // device 0
  config.dfp_path = "models/mobilenet_v2.dfp";
  config.iport = 0; // input port 0 (flow 0)
  config.ifmap = ifmap; // input feature map
  config.oport = 0; // output port 0 (flow 0)
  config.ofmap = ofmap; // output feature map

Python

  # 2. Run inference setup
  config = RunInferenceConfig()
  config.model_id = MODEL_1_ID # model 1
  config.group_id = GROUP_ID # device 0
  config.dfp_path = r"models/mobilenet_v2.dfp"
  config.iport = 0 # input port 0 (flow 0)
  config.ifmap = ifmap # input feature map
  config.oport = 0 # output port 0 (flow 0)
  config.ofmap = ofmap # output feature map

Basically inference sub-routine is similar to 2.2 Run Inference By Model with only setup up is slightly different.

3.3 Application Main#

C/C++

int main(void) {
  memx_status status = MEMX_STATUS_OK;
  pthread_t t0, t1;

  // 1. Bind MPU device group 0 as MX3:Cascade to both model 0 and model 1.
  if (memx_status_no_error(status)) {
    status = memx_open(MODEL_0_ID, GROUP_ID, MEMX_DEVICE_CASCADE);
  }
  if (memx_status_no_error(status)) {
    status = memx_open(MODEL_1_ID, GROUP_ID, MEMX_DEVICE_CASCADE);
  }

  // 2. Run two models simultaneously using posix threads (Linux only)
  if (memx_status_no_error(status)) {
    if ((pthread_create(&t0, NULL, &run_inference_model_0, NULL) != 0)
      ||(pthread_create(&t1, NULL, &run_inference_model_1, NULL) != 0)) {
      status = MEMX_STATUS_OTHERS;
    }
  }
  if (memx_status_no_error(status)) {
    pthread_join(t0, NULL);
    pthread_join(t1, NULL);
  }

  // 3. Always remember to clean-up resources before leaving.
  memx_close(MODEL_0_ID);
  memx_close(MODEL_1_ID);

  // End of process
  if (memx_status_no_error(status)) {
    printf("success.\n");
  } else {
    printf("failure.\n");
  }
  return 0;
}

Python

def main():
  '''Main process, create two threads to run inferences in parallel.'''
  err = 0

  # 1. Bind MPU device group 0 as MX3:Cascade to both model 0 and model 1.
  if not err:
    err = mxa.open(MODEL_0_ID, GROUP_ID, 3) # 3 = MX3:Cascade
  if not err:
    err = mxa.open(MODEL_1_ID, GROUP_ID, 3) # 3 = MX3:Cascade

  # 2. Run two models simultaneously using threads
  t0 = threading.Thread(target=run_inference_model_0, args=())
  t1 = threading.Thread(target=run_inference_model_1, args=())
  t0.start()
  t1.start()
  t0.join()
  t1.join()

  # 3. Always remember to clean-up resources before leaving.
  mxa.close(MODEL_0_ID)
  mxa.close(MODEL_1_ID)

  # End of process
  if not err:
    print("success.")
  else:
    print("failure.")

if __name__ == "__main__":
  main()

In the top level of application, we put open() and close() outside of our model inference sub-routines, because we do not want sub-routines to re-configure interface each time when it acquires lock and starts to run inference. Here we create two threads to demonstrate how to run two models in parallel using lock mechanism to avoid hardware access conflict.

Note

Remember the number of open() and close() should be the same. And yes, you can put open() and close() inside sub-routines if you want to.

3.4 Execution Result#

Execution output of this example is the same as 2.4 Execution Result with only file name is different.

Example 4: Runtime Device Selection#

4.1 Run Inference Sub-routine#

C/C++

  // 1. Here we use 'trylock()' first to test if device is acquirable, if the
  // return value is '0' means lock is acquired successfully, otherwise we move
  // on to lock another device.
  if (memx_trylock(GROUP_0_ID) == 0) {
    config->group_id = GROUP_0_ID;
  } else {
    memx_lock(GROUP_1_ID); // wait until lock is acquired
    config->group_id = GROUP_1_ID;
  }
  // 2. Re-configure MPU device group binded to model.
  memx_status status = memx_reconfigure(config->model_id, config->group_id);
  printf(" - Model %u is running on device %u\n",
    config->model_id, config->group_id);

Python

  # 1. Here we use 'trylock()' first to test if device is acquirable, if the
  # return value is '0' means lock is acquired successfully, otherwise we move
  # on to lock another device.
  if mxa.trylock(GROUP_0_ID) == 0:
    config.group_id = GROUP_0_ID
  else:
    mxa.lock(GROUP_1_ID); # wait until lock is acquired
    config.group_id = GROUP_1_ID
  # 2. Re-configure MPU device group binded to model.
  err = mxa.reconfigure(config.model_id, config.group_id)
  print(" - Model {} is running on device {}"
    .format(config.model_id, config.group_id))

Apart from 3.1 Run Inference Sub-routine, in this example we use trylock() before lock() to test if device 0 is available. If device 0 is currently locked by other model then we will move on to device 1. This time we will wait until lock is truly acquired.

After lock is acquired, it is important to call reconfigure() to bind and setup interface just like what we do in open(). Later after inference, remember to unlock the device which we selected during runtime.

4.2 Run Inference By Model#

This part is basically the same as 3.2 Run Inference By Model.

4.3 Application Main#

C/C++

  // 1. Bind MPU device group 0 as MX3:Cascade to both model 0 and model 1.
  // Group ID here can be any of MPU device group during initialization, since
  // runtime 'reconfigure()' will also initialize interface as 'open()'.
  if (memx_status_no_error(status)) {
    status = memx_open(MODEL_0_ID, GROUP_0_ID, MEMX_DEVICE_CASCADE);
  }
  if (memx_status_no_error(status)) {
    status = memx_open(MODEL_1_ID, GROUP_0_ID, MEMX_DEVICE_CASCADE);
  }

Python

  # 1. Bind MPU device group 0 as MX3:Cascade to both model 0 and model 1.
  # Group ID here can be any of MPU device group during initialization, since
  # runtime 'reconfigure()' will also initialize interface as 'open()'.
  if not err:
    err = mxa.open(MODEL_0_ID, GROUP_0_ID, 3) # 3 = MX3:Cascade
  if not err:
    err = mxa.open(MODEL_1_ID, GROUP_0_ID, 3) # 3 = MX3:Cascade

It does not really matter which device group we bind to model during the time we call open() to do initialization, since we will later use reconfigure() to setup interface anyway. As a result, here we use both device 0 as default interface, otherwise remains the same as 3.3 Application Main.

4.4 Execution Result#

C/C++

$ mkdir build && cd build && cmake .. && make && cd ..
$ sudo ./build/memx_c_example4
 - Model 0 is running on device 0
 - Model 1 is running on device 1
 - Model 0 argmax = 92
 - Model 1 argmax = 284
success.

Python

$ sudo python3 memx_py_example4.py
 - Model 0 is running on device 0
 - Model 1 is running on device 1
 - Model 0 argmax = 92
 - Model 1 argmax = 284
success.

Here we can see two more messages than 2.4 Execution Result that tell us that which model is running on which device. When second model failed to use trylock() to lock device 0, it automatically moves on to device 1. And that is the behaviour what we expected.

Resources#

C/C++

memx_c_example2.cpp

memx_c_example3.cpp

memx_c_example4.cpp

Python

memx_py_example2.py

memx_py_example3.py

memx_py_example4.py

Driver Usage#

Introduction#

Example 2: Multiple Models Coexist#

2.1 Run Inference Sub-routine#

2.2 Run Inference By Model#

2.3 Application Main#

2.4 Execution Result#

Example 3: Runtime Model Swap#

3.1 Run Inference Sub-routine#

3.2 Run Inference By Model#

3.3 Application Main#

3.4 Execution Result#

Example 4: Runtime Device Selection#

4.1 Run Inference Sub-routine#

4.2 Run Inference By Model#

4.3 Application Main#

4.4 Execution Result#

Resources#