Testing with Fewer Chips#

Important

In general, all 4 MX3 chips on the M.2 module are available and are used for executing one or more AI models. However in some cases, the target final design may require fewer than the 4 chips. The intent of this tutorial is to run AI model(s) on fewer than 4 chips while using the 4-chip M.2 module, to provide developers insights of the anticipated performance.

The MemryX architecture is a dataflow architecture designed so that multiple chips all act as one logical unit to the host. For example, a M.2 module with 4 chips will act as a single chip with more resources, fully transparent to the user. However, using the instructions below, a developer can limit the module to use only a subset of module resources.

Option 1: Compile for Two Chips#

By default, the compilation option uses all 4 chips. However, if you compile for 2 chips, the tools and the API will detect this, and the module will only activate two of its 4 chips.

Let’s use an example here by compiling a MobileNet model for two chips. First, download the model:

python3 -c "import tensorflow as tf; tf.keras.applications.MobileNet().save('mobilenet.h5');"

Now, compile it to two chips using the following command, with the compiler argument --num_chips or -c:

mx_nc -v -m mobilenet.h5 -c 2 --show_optimization

You can then benchmark the compiled model using mx_bench:

mx_bench -v -f 500 -d mobilenet.dfp

You should see output similar to this:

╔══════════════════════════════════════╗
║               Benchmark              ║
║  Copyright (c) 2019-2024 MemryX Inc. ║
╚══════════════════════════════════════╝

Ran 500 frames
Average FPS: 1870.78
Average System Latency: 1.86 ms

However, if you try to compile for any number of chips other than 2 or 4, you will encounter the following error when attempting to benchmark it:

memryx.errors.MxaError: Input DFP was compiled for a 1-chip solution but you have a 4-chip solution attached.

Option 2: Restricting Compilation Resources#

What if you still need to test configurations like a single chip? In that case, we have a compiler option for you. You can instruct the compiler to use only the resources of a specified number of chips, even if the module has more.

This can be done using the restricted chips option -rc. For instance, you can compile for two chips but restrict the resources to simulate a single chip:

mx_nc -v -m mobilenet.h5 -c 2 -rc 1 --show_optimization

Next, benchmark it:

mx_bench -v -f 500 -d mobilenet.dfp

You should see output similar to this:

╔══════════════════════════════════════╗
║               Benchmark              ║
║  Copyright (c) 2019-2024 MemryX Inc. ║
╚══════════════════════════════════════╝

Ran 500 frames
Average FPS: 1167.96
Average System Latency: 2.69 ms

Note

The restricted chips option is a software technique meant to provide insights into performance with fewer chips. However, it is not intended for actual deployment. It will not power off the unused chips, and they will still consume power.

Hint

In this tutorial, we are using the --show_optimization flag, which allows the user to see an animated display of the mapper optimization steps in the terminal. This shows the number of chips and the resources utilized within each chip.