Testing with Fewer Chips#
Important
In general, all 4 MX3 chips on the M.2 module are available and are used for executing one or more AI models. However in some cases, the target final design may require fewer than the 4 chips. The intent of this tutorial is to run AI model(s) on fewer than 4 chips while using the 4-chip M.2 module, to provide developers insights of the anticipated performance.
The MemryX architecture is a dataflow architecture designed so that multiple chips all act as one logical unit to the host. For example, a M.2 module with 4 chips will act as a single chip with more resources, fully transparent to the user. However, using the instructions below, a developer can limit the module to use only a subset of module resources.
Option 1: Compile for Two Chips#
By default, the compilation option uses all 4 chips. However, if you compile for 2 chips, the tools and the API will detect this, and the module will only activate two of its 4 chips.
Let’s use an example here by compiling a MobileNet model for two chips. First, download the model:
python3 -c "import tensorflow as tf; tf.keras.applications.MobileNet().save('mobilenet.h5');"
Now, compile it to two chips using the following command, with the compiler argument --num_chips
or -c
:
mx_nc -v -m mobilenet.h5 -c 2 --show_optimization
You can then benchmark the compiled model using mx_bench
:
mx_bench -v -f 500 -d mobilenet.dfp
You should see output similar to this:
╔══════════════════════════════════════╗
║ Benchmark ║
║ Copyright (c) 2019-2024 MemryX Inc. ║
╚══════════════════════════════════════╝
Ran 500 frames
Average FPS: 1870.78
Average System Latency: 1.86 ms
However, if you try to compile for any number of chips other than 2 or 4, you will encounter the following error when attempting to benchmark it:
memryx.errors.MxaError: Input DFP was compiled for a 1-chip solution but you have a 4-chip solution attached.
Option 2: Restricting Compilation Resources#
What if you still need to test configurations like a single chip? In that case, we have a compiler option for you. You can instruct the compiler to use only the resources of a specified number of chips, even if the module has more.
This can be done using the restricted chips option -rc
. For instance, you can compile for two chips but restrict the resources to simulate a single chip:
mx_nc -v -m mobilenet.h5 -c 2 -rc 1 --show_optimization
Next, benchmark it:
mx_bench -v -f 500 -d mobilenet.dfp
You should see output similar to this:
╔══════════════════════════════════════╗
║ Benchmark ║
║ Copyright (c) 2019-2024 MemryX Inc. ║
╚══════════════════════════════════════╝
Ran 500 frames
Average FPS: 1167.96
Average System Latency: 2.69 ms
Note
The restricted chips option is a software technique meant to provide insights into performance with fewer chips. However, it is not intended for actual deployment. It will not power off the unused chips, and they will still consume power.
Hint
In this tutorial, we are using the --show_optimization
flag, which allows the user to see an animated display of the mapper optimization steps in the terminal. This shows the number of chips and the resources utilized within each chip.