Inference + post-training for embodied AI

The cloud built from the ground up for robotics.

Train, fine-tune, and serve robot models on a low-latency GPU cloud. Stream an observation in, get an action back in under 30 ms.

Try our demo Read the docs

Scroll down for demo

Built on the stack your team already runs

NVIDIA

PyTorch

Hugging Face

ROS

Python

Docker

Kubernetes

Ray

Why Reflex

Faster training. Faster inference. Simpler fleets.

One platform from data to deployed model, built so AI teams ship robots without an MLOps detour.

Faster inference than the edge

State-of-the-art inference speed. Reflex fused kernels beat torch.compile on H100s, so a robot observation round-trips faster than running the model on the bot.

7×faster

vs torch.compile

<30msround-trip

obs to action

Reflextorch.compileimgs/sec · batch 1-16 · H100

Faster training

Beta

Fine-tune pi0.5, ACT, or your own VLA on managed GPUs. The same fused kernels that speed inference cut training step time too, so you pay for seconds, not nodes.

pi0.5pi0.7-flashACTyour own VLA

GB200

Grace Blackwell

~12 ms

round-trip

B300

Blackwell Ultra

~14 ms

round-trip

B200

Blackwell

~18 ms

round-trip

H200

Hopper

~24 ms

round-trip

H100

Hopper

~28 ms

round-trip

One deploy, every robot

Push a model once and it rolls out to your whole fleet over a single WebSocket. No SSH, no flashing, no drift between robots.

One push

per release

Whole fleet

in sync

One socket

no SSH

Three calls. That's the integration.

One WebSocket. Pick your model and LoRA, stream observations, execute actions. No serving stack to run.

01import reflex
02 
03@reflex.policy(
04    model="pi0.7-flash",
05    lora="pick-and-place",
06    cameras=["wrist", "scene"], hz=50,
07)
08class Controller:
09    @reflex.observation
10    def observe(self): return robot.observe()
11    @reflex.action
12    def execute(self, action): robot.execute(action)
13 
14Controller().run()

How it works

Your robot's brain, one network hop away.

Robots stream observations to a colocated GPU pool. Reflex runs the model and streams actions back, inside the control loop's latency budget.

Your robot

streams camera + state

50 Hz control loop

Reflex GPU pool

fused VLA kernels

colocated, same region

Anywhere

From the factory floor to the summit of Everest.

Reflex serves frontier models to any robot on any network. Same API, same latency budget, anywhere Starlink reaches.

Take it for a spin.

Tell the arm what to do. Be gentle — robots have feelings too.

ready — type a task for the arm

Status

modelmolmoact2-bimanualyam

server latency—

queue—

your runidle

try a prompt

“pack the container”
“fold the towel”
“pick up the cube”
“move blocks to spell AI2”
“unpack the container”

Ship the model.
Skip the infrastructure.

Train, serve, and roll out robot models on one low-latency cloud. Start free, scale per second.

Start building Talk to our engineer

The cloud built from the ground up for robotics.

Faster training. Faster inference. Simpler fleets.

Faster inference than the edge

Faster training

One deploy, every robot

Three calls. That's the integration.

Your robot's brain, one network hop away.

From the factory floor to the summit of Everest.

Take it for a spin.

Ship the model.Skip the infrastructure.

Ship the model.
Skip the infrastructure.