MX3 Power Tweaking#
Warning
This is intended for advanced users only. Setting performance mode too high may lockup the chip and require you to reboot the PC.
Introduction#
The MX3 M.2 module’s performance can be set higher or lower than the default, which we shall call 100%. Using the command sudo mx_set_powermode
, performance can be decreased down to 33% or increased up to 142%, with intervals in between. This may be of interest to users who have M.2 slots that do not deliver the full amount of power that PCIe spec should provide, or users who want to maximize performance of their system.
Important
On X86 systems, the default power mode is 100%. On ARM systems, which often have sub-spec M.2 slots, the default is 83%.
What Do the Modes Mean?#
These percentages can be thought of as approximate FPS, latency, and power consumption vs. the 100% default. So for a selection of 50%, you will likely see around 1/2 the FPS (and 2x the latency) when running a model vs. running the same model with the powermode set to 100%.
There are 2 important things to note about power modes:
FPS/latency is typically linear, but there may be situations where the model is bottlenecked by PCIe I/O throughput instead of MXA performance. In these cases, performance may not increase when using modes above 100%, or if already I/O saturated, not decrease with modes below 100%.
Power consumption is less correlated with these % values, due to many factors in the interplay between chip utilization time vs. waiting on I/O and also the voltage curves used.
2-chip vs. 4-chip#
The utility can separately change the power mode to use when running a DFP compiled to 4 chips (which is the Compiler’s default target), and a DFP compiled to only 2 chips.
You may select which operating mode configure on the first menu of the sudo mx_set_powermode
command:
4-chip Power Mode#
In the 4-chip mode menu, you can select the relative power/performance to run the M.2 at when running 4-chip DFPs. While the 100% setting will stay within PCIe spec for the M.2, going above higher than this may not work for high activity models.
2-chip Power Mode#
In the 2-chip mode menu, you can select the relative power/performance to run the M.2 at when running 2-chip DFPs. In the 2-chip scenario, half of the chips on the M.2 are idle. So there is typically no issue running 2-chip models at max speed.
Notes & Discussion#
In some situations, setting the powermode to lower values can improve overall power efficiency in some applications.
For some low-end ARM single-board computers that do not meet full M.2 spec, setting to a lower powermode may be necessary in order to run heavy-activity 4-chip models.
Note
A future MemryX offering will support automatic tuning of frequency at runtime, as a no-effort alternative to mx_set_powermode
. It will automatically detect and set modes for either highest FPS within a power budget, or lowest power within a minimum FPS.
How do I know if I need to reboot?#
Make sure you have the lm-sensors
Linux package installed, and run watch sensors
in a separate terminal while tuning.
If any MemryX chips report a temperature of 255°C
, they are in a lockup state and you will need to restart your PC to clear the PCIe bus.
Tips for Highest Performance#
If power consumption is of no concern, and your system’s M.2 slot is capable of M.2 PCIe spec (max 10W avg, 14.85W momentary spikes), and you want to see the highest possible FPS / lowest possible latency, you can try options above 100%.
Light models (think MobileNets or Yolo-Nano/Small) might easily reach 142% with power to spare, but heavy-activity models (like Jumpnet-101) might exceed the PCIe power threshold at 142%. The majority of models will fall somewhere in between.
Note
Power consumption corresponds to the activity factor of the MX3 chips. This factor is a result of the NeuralCompiler’s mapping of the model – not directly because of some property of the model.
For example, one cannot assume a 30M parameter model will definitely consume more power than a 20M parameter model. It might, or it might not.
Tips for Lowest Power#
On the other hand, if you are optimizing for energy efficiency, the optimial solution may be less obvious.
If you are only focused on limiting the average Watts consumed by the MX3, then choosing the lowest performance % that meets your FPS/latency requirements would suffice.
However, enegry efficiency also needs to consider how long it takes to compute frames, not just the power (W) while computing. Being a dataflow architecture, when parts of the MXA’s pipeline are empty, the corresponding compute and memory units enter a low power state.
So it may be more energy efficient to finish frames quickly then sleep, than run at a low speed without having time to sleep. The optimal choice will depend on your neural network model(s) and your input streaming rate.