Optional Model Optimization for MXA#

At MemryX, we pride ourselves on supporting efficient inferencing of your NN models with no required modifications. Other inferencing accelerators may rely on model tuning tricks to increase support/performance such as: fine-tuning, retraining, quantization, pruning, or layer conversions. We strive to implement an out-of-the-box experience, compiling your model as-is, in order to minimize your development/deployment times.

However, like any hardware, there are certain properties of models that run more efficiently on the MemryX MXA. For experienced users who would like to squeeze the maximum inferencing performance out of our accelerator as possible, the following is a list of ‘optional recommendations’ which may lead to more efficient use of the accelerator hardware. This may manifest as higher inferencing performance (↑FPS, ↓latency), increased energy efficiency, and/or result in requiring fewer chips to map.

Note

We will continue to expand this page with more tricks and details.