Transformer Support#

Support for transformer models is currently limited, but rapidly expanding. We have successfully mapped and validated several transformer and CNN+Transformer hybrid models, which are featured in the model explorer. Some notable examples include:

  • Yolo11 (Object Detection, Segmentation, Pose Estimation)

  • Yolov10 (Object Detection)

  • TinyViT (Vision Transformer)

  • TinyStories (Language Model)

The NeuralCompiler is able to recognize Transformer structures and map them to hardware-supported operators where possible. We use a heuristic of subgraph detection and decomposition to achieve this. For models that are missed, we can add support via a portable neural compiler extension (.nce) file.

In regards to attention mechanisms, we currently support Attention, MultiHeadAttention, and GroupedQueryAttention. This includes self-attention, cross-attention, and masking as well.

If you are working with models that incorporate Transformer architectures, we encourage you to reach out to MemryX. Our team will work with you to explore extending support for your specific model.

As we continue to refine and enhance our compiler and runtime, support for transformer models will broaden. Stay tuned for updates and improvements!