Release Notes#
v2.2#
The No Foolin’ release – April 1, 2026
SDK 2.2 is our biggest update since 2.0, and includes numerous improvements and new features across the NeuralCompiler, driver, runtime, utils, and model explorer!
Stay on 2.1 for Frigate NVR
Frigate stable (0.17) is currently tied to SDK 2.1, so if you are using MX3 for Frigate, please keep using SDK 2.1.
Compiler#
Expanded + Enhanced Operator Support
- Added new operators:
Equal (ONNX)
UnitNormalization (Keras)
- Enhanced existing operator support:
- Keras
Dot
LayerNoramlization
- ONNX
MatMul
Clip
ReduceSumSquare
Pad
- All Frameworks
MaxPooling & ReduceMax: allow pooling on the channel dimension
MultiheadAttention: improved mappability and performance
Improved support for some cases of non-singleton batch dimensions.
Massive Performance Boost for YOLO v10/11/26
With operator optimizations plus a new mapping technique for ACores, some YOLOs can reach up to a 400% boost in FPS!
Some of the MXA-Optimized versions of YOLO11 models from SDK 2.1 have been removed from Model Explorer, since original models can now reach the same or better performance. - A few such as YOLO11n-640 still see boosts when optimized for MXA, so they remain in Model Explorer.
Model Support
Autodetect and enable the NeuralCompiler Extension for YOLO v10/v11/26 support, so users don’t need to remember to pass the argument anymore
Added support for multiple new CNN+transformer hybrid architectures.
NeuralCompiler Extensions for Tensorflow and Keras
The NeuralCompiler Extensions (NCE) system, introduced in SDK 2.0, now supports extensions for TensorFlow and Keras models.
Under-the-Hood Improvements
Improved exceptions and error handling.
Extensive expansion of pytest and automated regressions.
Driver#
New DMA Implementation
The driver now uses a new DMA implementation that enables higher throughput, boosting FPS on bandwidth-heavy models.
Completed USB Driver
The driver for MX3 form factors that use USB 3.0 is complete and production-ready.
Note: USB modules with the MX3 are not yet available to the general public, but are already in use by select customers. Stay tuned for purchasable options in the near future!
Runtime#
Python wrapped MxAccl (preview)
The MxAccl C++ API is now wrapped with pybind and available to use directly from Python, allowing users to use the same API in both languages.
This will also boost performance for Python users, especially when multiple streams are involved!
The existing pure-Python APIs (e.g.,
AsyncAccl) will continue to be supported, but the new Python MxAccl API will be the recommended option for most users going forward.
Note
In the next SDK release (2.3), the Python MxAccl API will be promoted from preview to stable status, and the pure-Python APIs will be deprecated.
Auto-Clocking Support
On supported SKUs only, you can now set the runtime (C++ and Python) to find the fastest speed for your DFP while staying under a specified power limit. - For example: set the power_limit to 10W, and after a few seconds delay of finding the optimal clock speed, your DFP will be running at the fastest possible speed while consuming less than 10W.
This feature can significantly boost performance. - For example, if at the default frequency (600MHz, 14 TOPS) your DFP only uses 6W, with a 10W limit it may be able to boost all the way to 1GHz (24 TOPS) and achieve much higher FPS.
It can also be used to restrict power consumption on hosts with limited power delivery (e.g. 5W), such as some ARM systems, automatically avoiding crashes.
Optimized use_model_shape Functionality
use_model_shapein C++ now supports the same full set of data reorganization operations as Python, while maintaining high performance thanks to a new algorithm.
C++ Refactor and Optimization
The
MxModelclass in C++ has been completely rewritten to improve both performance and code maintainability.
mxa-manager Protocol Version 2
The communication between clients and mxa-manager is now versioned, to allow for new features while keeping a window of backwards compatibility for older clients.
Version 2’s main difference is just the inclusion of auto-clocking parameters, and the removal of the unused
stop_on_emptyoption.
Utils#
MxPrepost
New Python and C++ library for optimized pre/post-processing for YOLO detection, segmentation, and pose estimation models.
Replaces use of cropped onnxruntime/tflite models, providing a massive boost to performance and a drastic reduction in host CPU overhead.
pipinstallable asmxprepostpackage on both x86 and ARM, Pythons 3.9 through 3.12.Fully homemade and MIT licensed, with source available on GitHub.
See also
Check out this MxPrepost + YOLO11 tutorial!
Model Explorer#
Over 100 New Models
Through a combination of improved NeuralCompiler support and the addition of new model sources, Model Explorer now has over 560 models!
v2.1#
The Halloween Spooktacular 🎃 release – Oct. 31, 2025
SDK 2.1 includes numerous improvements to the “lower levels” of the SDK, improving host hardware/OS support and expanding runtime APIs.
Driver#
Auto-configure MSI-X, MSI, or INTx mode
Detects the optimal PCIe interrupt scheme for your host and selects it accordingly
This means the MX3 M.2 should work with older and non-standard PCIe controllers, including gen 2.0
Extended kernel support
Improve compatibility with Linux kernels 5.10 and older
Increase device limits
Increase the maximum number of connected MX3 M.2 modules from 4 to 128
Runtime#
“Pressure” monitoring API
New set of functions in Python and C++ for checking an MX3 module’s “busy” factor
Will return LOW, MEDIUM, HIGH, or FULL
This metric can be used for deciding whether to add additional load/streams to a given MX3 module
acclBench latency
acclBench C++ benchmark tool now reports latency ( in-to-out latency )
set_powermode API
Added functions for setting frequency (power mode) directly from Python and C++
NOTE: for advanced hardware testing purposes. Most people shouldn’t need to touch these!
Stability & Performance Fixes
Refactored parts of
mxa_managerand the C++ runtime to improve performance and stability
OS Support#
Debian 13
Add official support and
.debpackages for Debian 13Also adds support for the latest Raspberry Pi OS, which is based on Debian 13
In turn, our
aptpackages now support Ubuntu 25.04 and 25.10 too
Note
For installing Python 3.12 on Debian 13, use uv as described here.
v2.0#
The Level Up release – July 28, 2025
SDK 2.0 is a monumental release that introduces major new features and improvements across the entire software stack.
Compiler#
Neural Compiler Extensions (NCE)
New mechanism for extending Neural Compiler support: install .nce plugin files to add or patch graph handling between SDK releases.
Extensions can hot-fix graph issues or add decompositions for new ops.
Some extensions may merge into the core compiler in future releases; others can remain private (e.g., proprietary models).
Future SDK versions will include an online repository for official extensions and developer documentation.
Multi-Threaded Compilation
With –effort hard, optimization steps now parallelize across CPU threads—significantly reducing compile times on high-core systems. Use -j to specify the number of cores.
Model and Operator Support Enhancements
- Model support:
YOLO v10 & v11 (nano, small, medium)
ConvNeXT
ViT-small
… and more
- Operator support:
Fractional up-/down-sampling (e.g., 500 → 300 = 0.6×)
Modulus (%)
Enhanced TransposeConv
Improved handling of non-standard tensor shapes/layouts
DFP Operator Folding and Batch Support
Boundary operations (transpose/reshape) now execute within the DFP.
Original tensor shapes are stored in the DFP; runtime reshapes occur automatically.
Transparent batch-dimension handling for seamless deployment.
Under-the-Hood Improvements
Complete rewrite of shape-handling logic
Reimplementation of TFLite, Keras, and ONNX loaders
Numerous bug fixes and architectural cleanups
Runtime#
Multi-DFP Support
C++ and Python APIs now support loading multiple DFPs concurrently. (Co-mapping is still the preferred method.)
All-New MXA-Manager
Rebuilt with a custom networking/scheduling stack (replaces gRPC).
Shared Mode matches Local Mode in FPS and CPU usage.
Monitors temperature and power, sharing data with all clients.
Supports arbitrary combinations of clients, DFPs, and usage scenarios.
Provides user-configurable performance tuning knobs.
Socketfiles Replace IP Networking
MXA-Manager now uses UNIX domain sockets by default for inter-process communication.
Provides equivalent or better performance, with improved security and deployment simplicity.
TCP/IP remains supported and is faster than before.
DFP Shape Folding
Folded reshape operations now run transparently on the host—no need for auto-crop.
Python Shared Mode Support
Python Runtime now supports Shared Mode and Multi-DFP functionality, achieving feature parity with the C++ API.
Python Multi-Device Load Balancing
Python API can distribute inference workloads across multiple connected MX3 M.2 modules, just like C++.
Faster C++ Runtime
Input callbacks perform fewer memcpy operations and use a new matrix transposition method, improving FPS and reducing CPU load, especially on ARM SoCs.
MXA-Manager for Windows
Now included in the Windows SDK, enabling Shared Mode and Multi-DFP support on Windows.
Driver#
Faster DFP Downloads
Linux and Windows drivers switch DFPs up to 4× faster than SDK 1.2.
Faster downloads and caching of pre-parsed DFPs.
Improved Host System Compatibility
- MX3 firmware detects and optimizes for select x86 platforms.
Tuned MSI-X capability
Faster driver boot times
Standard Windows Installer
A standard Windows installer has been released for easier setup.
Fixes
Multiple bug fixes and stability improvements
Utils#
Improved GUI Toolkit
Significant bug fixes and performance optimizations for the Qt-based C++ toolkit.
Developer Hub#
Expanded Model Explorer Sources
The open-source timm models are included in Model Explorer.
Tutorials
Updated tutorials now point to the MemryX Examples page to share a unified, updated codebase.
Added new Windows “Getting Started” tutorials.
v1.2#
The Snowday release – Mar. 5, 2025
SDK 1.2 is a light, yet very important release that updates the MX3 firmware and driver, and improves runtime stability.
Runtime Enhancements#
Driver & Firmware Updates
Updated drivers and firmware to enhance performance and overall stability.
New firmware improves MX3 M.2 compatibility with host PCIe controllers.
MXA-Manager Improvements
Renamed mx-server to mxa-manager to more accurately convey its purpose.
Implemented several fixes and refinements.
Windows Updates
Windows and Linux versions are aligned again (both 1.2)
All fixes and improvements to driver, firmware, and runtime are included
NOTE: but mxa-manager remains Linux-only for now
v1.1#
The Holiday Special release – Dec. 23, 2024
SDK 1.1 focuses on the Linux Driver and C++ Runtime, with performance improvements and new options for certain use cases. The NeuralCompiler has several improvements as well.
Runtime#
Driver Improvements
Enhanced performance with the latest driver, with some models achieving up to 30% improvement in FPS.
Firmware Updater
The Linux driver package
memx-driverswill auto-update the firmware (if required) on connected devices.The firmware update tool is also available to use directly.
MXA-Manager (Preview)
Multiple concurrent processes can now share the same MX3 M.2 using the new
mxa_managerLinux service!This service also allows for use of the MX3 in parallel from different Docker containers, and even sharing over an IP network.
Please see the first MXA-Manager tutorial for an introduction.
Important
MXA-Manager is released in a preview state for SDK 1.1 and will undergo improvements in future SDKs. The user-facing API for the C++ Runtime is considered stable, but under the hood there are known performance issues that we are working to address.
When possible, users should continue to use the C++ Runtime as before: one or more video streams (using threads) in a single process.
No-Copy C++ Runtime Option
For users who want more fine-grained memory management in our C++ Runtime, “no-copy” versions of the
get_datafunctions have been added.These versions do not copy data, but instead operate directly on the pointers supplied to them.
This feature is espeically helpful on low-end ARM Linux systems, where copying memory can add noticeable latency.
This is an advanced feature that requires application developers to be aware of the implications.
Compiler#
Expanded Support and Enhanced Performance and Stability
Check out the operator support page for a list of all supported operators.
Multiple fixes and optimizations have been made to the compiler.
Naming Convention
Cropped pre/post models now follow updated naming conventions, where the model number is appended to the model name only if more than one model is compiled concurrently.
Developer Hub#
Model eXplorer
The Model eXplorer has been updated with new models and DFPs.
New Tutorials
The tutorials section has been updated with new tutorials.
The tutorials cover a wide range of topics, from basic model compilation to advanced features like the MXA-Manager.
Subreleases#
v1.1.1 → v1.1.5
Minor fixes and improvements.
Note
SDK 1.1 features are Linux-only for now, while Windows features remain at SDK 1.0. An upcoming SDK release will bring Windows back up to feature parity with Linux.
v1.0#
The Hello World release – Oct. 1, 2024
MemryX SDK 1.0 is our first publicly available SDK, and over the previous 0.10 version, it includes many important new features.
General#
Python 3.11 and 3.12
The Python package now supports Pythons 3.9 through 3.12!
Important
We have dropped support for Python 3.8. Please see here for help if you cannot install 3.9 or above.
Compiler Support on ARM
The NeuralCompiler is now fully supported on ARM devices.
No need to compile models on x86 and copy them over!
ARM Device Setup Helper
The
mx_arm_setupcommand is now part of the driver package on ARM.Run this command once after install to set up device tree overlays and/or other board-specific tweaks automatically.
Currently this script supports:
Raspberry Pi 5 (Raspberry Pi OS recommended)
Orange Pi 5 Plus
Orange Pi 5 Max
Radxa Rock 5B
We will continue to add more board support in the future.
Other boards may already work out-of-the-box. If you are having issues with your ARM platform, please reach out to MemryX for assistance.
DFP v6
The DFP file format has been revised from v5 to v6.
Although this is mostly an internal feature, it allows us to add new features to the SDK more easily in the future.
All libraries and the driver still support v5 files. But the NeuralCompiler will only output v6 from now on.
Runtime#
Multi-device Load Balancing
By simply passing a list of device IDs to use instead of a single number, the C++ runtime will automatically run your DFP on multiple M.2 cards and load balance your streams between them!
In other words: 2x M.2 == 2x FPS by changing a single line of code.
Automatic Pre/Post Runners (C++)
The C++ Accl API now supports automatic execution of cropped pre/post ONNX/TF models.
You no longer have to manage your own inference sessions!
The AsyncAccl Python API already had similar functionality for Python applications.
Note
These pre/post functions are provided by the memx-accl-plugins package.
C++ GUI Helper Library
The Qt-based GUI toolkit library has been added to the SDK as a convenient alternative to OpenCV imshow or custom-written Qt.
C++ Manual Threading Mode Changes
The manual-threading mode of
MxAcclhas been moved to a separate class calledMxAcclMT.The
receive_outputfunction can now block for an output from a specificstream_id, instead of needing the user to sort frames to streams.
Updated Driver
The driver now supports the “low-BAR M.2” being sampled as part of a 4x M.2 kit to certain customers.
Continued performance improvements in the driver, particularly for high FPS models. Max model FPS can now reach ~40,000 instead of ~15,000.
Advanced users can now increase or decrease the M.2’s power & performance from the default, using the
mx_set_powermodescript.
Compiler#
New Framework Support
Added support for Keras 3 while maintaining backward compatibility with Keras 2.
Upgraded TensorFlow support to version 2.17.
Enhanced Performance and Support
Expanded operator support, including new and optimized operators.
Improved accuracy for Softmax and exponential operators.
The –effort hard compilation option has been further optimized, offering speedups across a wide range of models.
Some models have experienced substantial performance gains, increasing speed by several folds.
Faster Compilation
Compilation times have been significantly reduced, with speedups reaching up to 3x for certain models.
Stable API
The compiler APIs have reached a stable state (v1.0.0).
User Interface
Enhanced CLI visualization for a more intuitive user experience.
More detailed error messages providing step-by-step guidance to resolve issues.
Alpha/Beta Releases#
v0.10
Driver Performance Boosts
New output feature map optimization gives better PCIe bandwidth utilization. This boosts FPS for many-output models such as SSDs and newer YOLOs.
Solved bug that caused FPS to plateau around 2500 FPS. Now small models can easily exceed 15,000 FPS on the MX3.
Improved MX3 Firmware
The MX3 firmware now has a 600MHz base frequency for the M.2 (+20%), following extensive testing and characterization by our platform teams.
The MX3 thermal throttling temperature has been increased from 85C (Tj) to 100C, again following extensive characterization by our teams.
In the most, already-rare, situations where the MX3 hangs, it will now self-restart instead of requiring a host reboot.
Compiler Effort Control
The NeuralCompiler now has the
--effortflag, which allows users to boost FPS substantially by allowing the Compiler more time to try different optimizations. Using –effort hard, users can get much higher FPS vs. the default (--effort normal). Results will vary by model: we have observed from 1.0x to >10.0x.The use of
--effort hardis now strongly recommended before deploying your model into a final product.But please note that use of
hardwill greatly increase the amount of time needed to compile your model: around 5x to 15x longer. In the next SDK release, this flag will be optimized to take only a tiny bit longer thannormal.
Other Compiler Features
Expanded operator support and FPS boosts for existing models (even on
normal).Improved
mx_ncstartup time.More optimized auto-selection of data packing formats.
C++ API Performance Improvements
Single-stream scenarios can experience up to a ~30% boost on some host CPUs.
Manual-threading mode has significant performance boosts for multi-stream scenarios.
acclBench Tool
C++ version of the
mx_benchPython tool, for better cross-platform support.acclBenchexecutable included with the C++ API package.See the benchmark page for more info.
Expanded Model Explorer
The Model Explorer has been expanded with new models and downloadable DFPs.
You can now track this page to monitor FPS improvements between SDK updates!
0.10.2
Accl: C++ fix for manual threading mode and ONNX models
Accl: added a
memx-accl-noavxoption for x86_64 systems that lack AVX2
0.10.1
Compiler: 16-bit weight bugfixes
Accl/Benchmark: Python bugfix for some ARM systems
v0.9
Neural Compiler Additions & Improvements
Logging: Introduced a comprehensive logging system for improved debugging and diagnostics for the Neural Compiler. Optionally, users can submit these logs to MemryX to help improve the Compiler.
Expanded Support: The Compiler has added additional operator support across various frameworks. For a detailed list please visit the operator support page.
Improved User Messaging: Updated compiler messages now feature actionable TODOs to help users address issues more effectively.
Performance Enhancements: Major improvements and bugfixes have significantly increased performance for some models.
C++ API Thread Management
The C++ version of the Multi-Stream Accl API now has support for limiting the number of Stream worker threads.
This feature can improve performance and/or reduce CPU overhead in many-stream applications such as VMS.
Driver Improvements
Input feature map bandwidth has been greatly optimized. Please note there is still a known issue with output feature map bandwidth that will be addressed in a future update.
Temperature data is now reported over the standard Linux hwmon interface.
Model Explorer
Discover and utilize models effortless with our new Model Explorer.
The MemryX SDK avoids manual tuning or retraining and is committed to running models out-of-the-box.
All models listed on the Model Explorer are directly from their original sources; no modifications have been made and no model-specific tunings have been done.
v0.1 to v0.8
Removed so we don’t clutter up this page :-)