Inference + post-training for embodied AI

The cloud built from the ground up for robotics.

Train, fine-tune, and serve robot models on a low-latency GPU cloud. Stream an observation in, get an action back in under 30 ms.

Scroll down for demo

Built on the stack your team already runs

NVIDIA
PyTorch
Hugging Face
ROS
Python
Docker
Kubernetes
Ray
Why Reflex

Faster training. Faster inference. Simpler fleets.

One platform from data to deployed model, built so AI teams ship robots without an MLOps detour.

Faster inference than the edge

State-of-the-art inference speed. Reflex fused kernels beat torch.compile on H100s, so a robot observation round-trips faster than running the model on the bot.

faster
vs torch.compile
<30msround-trip
obs to action
Reflextorch.compileimgs/sec · batch 1-16 · H100

Faster training

Beta

Fine-tune pi0.5, ACT, or your own VLA on managed GPUs. The same fused kernels that speed inference cut training step time too, so you pay for seconds, not nodes.

pi0.5pi0.7-flashACTyour own VLA
GB200
Grace Blackwell
~12 ms
round-trip
B300
Blackwell Ultra
~14 ms
round-trip
B200
Blackwell
~18 ms
round-trip
H200
Hopper
~24 ms
round-trip
H100
Hopper
~28 ms
round-trip

One deploy, every robot

Push a model once and it rolls out to your whole fleet over a single WebSocket. No SSH, no flashing, no drift between robots.

One push
per release
Whole fleet
in sync
One socket
no SSH

Three calls. That's the integration.

One WebSocket. Pick your model and LoRA, stream observations, execute actions. No serving stack to run.

01import reflex
02 
03@reflex.policy(
04 model="pi0.7-flash",
05 lora="pick-and-place",
06 cameras=["wrist", "scene"], hz=50,
07)
08class Controller:
09 @reflex.observation
10 def observe(self): return robot.observe()
11 @reflex.action
12 def execute(self, action): robot.execute(action)
13 
14Controller().run()
How it works

Your robot's brain, one network hop away.

Robots stream observations to a colocated GPU pool. Reflex runs the model and streams actions back, inside the control loop's latency budget.

Your robot
streams camera + state
50 Hz control loop
Reflex GPU pool
fused VLA kernels
colocated, same region
Anywhere

From the factory floor to the summit of Everest.

Reflex serves frontier models to any robot on any network. Same API, same latency budget, anywhere Starlink reaches.

Take it for a spin.

Tell the arm what to do. Be gentle — robots have feelings too.

robot top camera
ready — type a task for the arm
Status
modelmolmoact2-bimanualyam
server latency
queue
your runidle
try a prompt
  • “pack the container”
  • “fold the towel”
  • “pick up the cube”
  • “move blocks to spell AI2”
  • “unpack the container”

Ship the model.
Skip the infrastructure.

Train, serve, and roll out robot models on one low-latency cloud. Start free, scale per second.