Overview#
Accelerator#
It is helpful to begin with commonly used terms describing the MemryX AI Accelerator.
MXA
MemryX Neural Network Accelerator Chips are called ‘MXA’ for short: [M]emry[X] [A]cclerator
MX3
There were 2 internal prototypes before the release of our first product. Therefore, this generation is called the ‘MX3’.
MCE
Within a MXA are hierarchically organized Compute Engines containing custom Neural Logic, MACs, ALUs and more. Each MemryX Compute Engine is called an MCE. There are hundreds of MCE’s in each MXA.
Pure Dataflow#
The MXA is built from the ground up to accelerate neural network inference. Hardware and software were co-designed together to enable pure dataflow execution of neural network workloads. The MXA is optimized for the efficient flow and processing of data. During execution of neural network models, input data is streamed through network layers according to the trained AI model until reaching the output layer. The figure below is an illustrative example of data streaming from the input node to the two output nodes via six neural network layers.
The MXA has programmable hardware that aligns to the flow of trained AI-Model(s). Each MXA chipset is made of hundreds of dataflow cores, called MemryX Core Engines (MCEs). The operation of each MCE and the data routes are programmed together during neural network model deployment, eliminating the need for runtime instruction scheduling. After configuration, the MXA architecture efficiently streams data from input to output.
The MXA utilizes the space multiplexing of its many cores to accelerate neural network layers. Each layer of the network is assigned a specific number of MCEs to optimize the overall inference performance and data flow, as shown in the following illustration. Each MCE can be programmed to perform a specific job driven by the data stream without the need of a universal control unit or runtime scheduler.
Concurrent Models and Streams#
Each MXA can seamlessly support concurrent operations of multiple neural network models and data streams. The user only needs to provide the neural compiler with the set of models and the selected number of MXA chips as inputs. The neural compiler will automatically map multiple concurrent models to the selected number of MXAs. The SDK will also optimally distribute hardware resources among the models to achieve the highest possible performance. Note that the exact same software is used to map one large model across many MXAs or many small models across a single MXA.
Inherently Scalable#
Scalability is an inherent attribute of MemryX hardware and software systems. Each MXA has a given number of MCEs. Therefore, 2X the chips means 2X the AI computing capabilities, just as 10X the chips provide 10X the MCEs and hence 10X the computing capability of a single MXA.
The user can cascade any number of MXA chips. Reasons for adding additional MXAs include supporting larger and/or more models, increasing model performance or lowering latency. The neural compiler automatically distributes the workload of any number of models over any chosen number of MXA chips. In the diagram below, two chips have been cascaded, and the neural compiler optimally distributes the workload of the NN model to achieve high inference performance.
Development Flow#
The MemryX SDK is a lightweight, high-performance software stack designed to run AI models on MemryX accelerators with minimal effort. Developed alongside our hardware, it provides a streamlined interface for deploying neural networks efficiently across systems. With a focus on simplicity and speed, the SDK includes all the tools and runtime components needed to convert models, deploy them to hardware, and execute inference workloads—whether you’re building production systems or prototyping AI applications.
The MemryX development flow consists of two main stages: Compiletime and Runtime.
1. Compiletime#
The goal of the compiletime stage is to generate the dataflow program (DFP), which will be used to configure the MXA. The first step is to compile the neural network model(s) to accelerate using the MemryX neural compiler to create a DFP. Next, program your chip using the generated DFP. Simulations of the DFP can also be used to give a quick and reliable estimate of the chip performance and latency without requiring the target hardware.
2. Runtime#
MemryX runtime tools and drivers interface with the MXA. Drivers are available for multiple operating systems and provide C/C++ and Python bindings for ease of integration. The tools/drivers are designed to be easily integrated with off-the-shelf image pipelines like GStreamer and OpenCV as well as IP implementations. Since runtime scheduling is not required, runtime execution of MXA is simple, deterministic and causes no overhead on the host system.
MemryX Solutions Requires:#
With MemryX, high-performance AI inference comes without the traditional deployment burdens. Here’s what you can confidently skip:
MXAs use BFloat16 activations, so no pilot dataset or calibration is needed. Unlike INT8-only systems, runtime conditions do not require tuning.
The MemryX Neural Compiler maps your trained model to hardware in minutes—no manual optimization required.
Hit high throughput without compromising accuracy. The compiler ensures near-original fidelity with optimal performance.
MemryX respects your original model—no pruning, no retraining. Just deploy and run with high utilization and efficiency.