Meta AI Researchers Introduce Matrix: A Ray-Native, Decentralized Framework for Multi-Agent Synthetic Data Generation

How do you keep synthetic data fresh, diverse, and constantly flowing for today’s fast-growing AI models—without letting a single central pipeline become the choke point? That’s the challenge Meta AI researchers set out to solve. Their answer is Matrix, a decentralized system where both control logic and data move around as messages inside distributed queues.

As LLM training now depends heavily on synthetic conversations, tool traces, and reasoning chains, the old approach—systems ruled by a central controller or packed with domain-specific hacks—has started to crack. It wastes GPU cycles, slows everything down with coordination overhead, and limits how much variety your data can have.

Matrix flips this script. It uses peer-to-peer scheduling on a Ray cluster and achieves 2× to 15× higher token throughput in real workloads—all while keeping data quality the same. And honestly, watching how it works feels like seeing a traffic jam suddenly break into smooth, self-organized flows.

From Centralized Controllers to Peer-to-Peer Agents

Traditional agent systems treat the controller like a boss who micromanages everything. Every agent call, every tool call, every retry—you name it—must go through the central brain. Sure, it’s simple to understand. But when you need tens of thousands of ongoing synthetic dialogues, this setup turns into a giant bottleneck.

Matrix takes a refreshing direction. It packages both control flow and data into a message object called an orchestrator. Think of it as a traveling notebook holding the current task state: the entire conversation so far, any intermediate results, and the logic about where it should go next.

The agents (Ray actors) are stateless. They simply:

1. Pull an orchestrator from a distributed queue

2. Apply their specific logic

3. Update the state

4. Pass it directly to the next agent

No central scheduler telling them what to do every second. Each task moves forward at its own pace. Fast trajectories zoom ahead; slow ones don’t hold anyone back.

This feels wonderfully clean—if one task fails, it doesn’t drag the whole batch down. Each job lives and dies on its own, which is honestly liberating for high-scale pipelines.

System Stack and Services

Matrix runs on a Ray cluster, usually launched via SLURM. Ray offers distributed queues and actor management—the foundation of its decentralized spirit. Ray Serve exposes LLM endpoints using vLLM and SGLang, but can also reach external APIs like Azure OpenAI or Gemini through proxies.

Tools and more complicated services run inside Apptainer containers, keeping the agent code separate from anything risky or heavy like HTTP tools or code sandboxes. Hydra manages the entire configuration landscape—agent roles, orchestrator formats, resource allocation, input/output schemas.

Monitoring happens through Grafana, which watches metrics such as queue sizes, pending tasks, token throughput and GPU usage.

Matrix also introduces message offloading. When conversation histories grow big, they’re stored in Ray’s object store. The orchestrator only keeps lightweight pointers. This cuts down bandwidth while still letting agents rebuild the prompt whenever they need it.

It’s like packing your suitcase smarter—carrying just what’s needed, while the bulk stays in baggage storage.

Case Study 1: Collaborative Reasoner

Collaborative Reasoner—also known as Coral—evaluates multi-agent reasoning by letting two LLMs discuss a question, argue if necessary, and settle on an answer. Originally, Coral depended on a central controller to coordinate thousands of these interactions.

Matrix rebuilds the same workflow with decentralized orchestrators and stateless agents.

On 31 A100 nodes, using LLaMA 3.1 8B Instruct, Matrix runs with:

248 GPUs
50 queries per GPU
12,400 concurrent conversations

Coral runs at its peak around 5,000 conversations.

On identical hardware:

Matrix: ~2 billion tokens in ~4 hours
Coral: ~0.62 billion tokens in ~9 hours

That’s a 6.8× increase in throughput—wow—while maintaining nearly identical answer agreement (~0.47).
It’s fast and fair.

Case Study 2: NaturalReasoning Web Data Curation

NaturalReasoning builds reasoning datasets from huge web corpora. Matrix models the system with three agents:

Filter agent: A small model filters English passages likely containing reasoning
Score agent: A larger instruction-tuned model gives each passage a quality score
Question agent: Extracts Q&A pairs and reasoning chains

From 25 million DCLM web documents, only 5.45% survive all filters—ending with around 1.19 million high-quality reasoning examples.

Matrix then tests parallelism strategies on a 500k-document subset. The best results come from mixing:

20 data partitions
700 concurrent tasks each

This beats pure task concurrency by 1.61×.

Across the full dataset:

Matrix: 5,853 tokens/sec
Ray Data baseline: 2,778 tokens/sec

A 2.1× throughput gain, achieved purely through better scheduling—not better models.
Impressive for something that feels like a systems-level magic trick.

Case Study 3: Tau2-Bench Tool-Use Trajectories

Tau2-Bench measures agents that need to use tools and a database—similar to customer support workflows. Matrix models this with four agents:

User simulator
Assistant
Tool executor
Reward calculator

Plus a sink that collects metrics. The tool APIs and reward logic come straight from the Tau2 reference, wrapped neatly in containers.

On a cluster with 13 H100 nodes, Matrix generates:

22,800 trajectories
~1.25 hours
~41,000 tokens/sec

The baseline Tau2-agent implementation (single node, 500 threads) reaches:

2,654 tokens/sec
1,519 trajectories

Matrix maintains almost identical reward scores—proving the boost doesn’t come from cutting corners.

Overall: 15.4× throughput improvement.
That’s not just faster—it’s a different league.

Key Takeaways:

Matrix replaces central controllers with a peer-to-peer, message-driven architecture, where each task acts like its own self-contained state machine.
It’s built entirely on open-source tools: SLURM, Ray, vLLM, SGLang, Apptainer.
It scales comfortably to tens of thousands of concurrent multi-agent workflows.
Across Collaborative Reasoner, NaturalReasoning, and Tau2-Bench, it delivers 2× to 15.4× more throughput on the same hardware.
Matrix’s clever offloading system lightens network load and supports high-throughput LLM serving using gRPC backends.

Editorial Notes

Matrix is a practical, thoughtfully-engineered system that pushes multi-agent synthetic data generation far beyond hand-written scripts. By packing control and data logic into orchestrators and letting stateless agents move them forward on Ray, Matrix cleanly separates scheduling, model inference and tool execution.

The case studies—Collaborative Reasoner, NaturalReasoning and Tau2-Bench—show a clear message: scaling synthetic data isn’t just about bigger models anymore. It’s about better systems.

And Matrix feels like a step forward in how those systems should be built.