Cross-robot transfer: same drawer, Franka YES, Allegro NO

The question every robotics team asks of every asset is the same: can my robot do this task on this object? Most asset formats answer it with a confidence number. We answer it with a verdict and a reason.

This post explains why the design choice matters, walks the implementation, and shows the demo where the same drawer differentiates between two robots, and the solver explains itself.

The choice we made

A transfer report could be a confidence value. Some are. We considered it, and rejected it.

Confidence numbers are how robotics papers stop being reproducible. They are unfalsifiable in either direction. A 0.78 confidence on a manipulation transfer can mean “the model is unsure,” “the data was thin,” or “the developer guessed.” A reviewer cannot audit a confidence number.

So we made a different choice. The transfer report is a verdict with a reason:

{
  "task": "open_bottom_wide",
  "results": [
    {
      "robot": "franka_panda",
      "verdict": "yes",
      "margin": 1.2,
      "reason": "peak_grip 0.93 N >= force_max_N 0.78 N (sustainable)",
      "source_dna": "task_dna_franka_rl.json"
    },
    {
      "robot": "allegro_hand",
      "verdict": "no",
      "margin": 0.71,
      "reason": "peak_grip 0.41 N < force_max_N 0.58 N",
      "source_dna": "task_dna_franka_rl.json"
    }
  ]
}

The verdict is yes or no. The margin is grounded in a physical quantity. The reason is auditable. The source DNA is referenced explicitly. A reviewer can walk every claim back to the rollout that produced it.

The transfer report is the conversation between the bundle author and the bundle consumer. We refused to write it in a language only the author understands.

How the solver works

The transfer solver is small. It is intentionally small.

For each task DNA record in the bundle, and for each robot in the panel, the solver performs three checks:

Force margin. Replay the task DNA’s gripper trajectory in simulation, with the candidate robot’s gripper geometry and joint limits substituted. Measure peak grip force. Compare to force_max_N from the robot panel’s specification. Verdict gates on peak >= force_max_N.
Reach. The waypoint sequence’s pose targets must be within the candidate robot’s reachable workspace, with a small safety margin.
Joint friction sustainability. The drawer joint’s static friction is a known calibrated value. The robot’s sustained pull force, integrated over the trajectory’s pull phase, must exceed it.

If all three pass, the verdict is yes with a margin equal to the smallest individual margin. If any fail, the verdict is no with the failing reason recorded.

We do not run a learned classifier. We do not estimate. We replay the bundled task DNA against a candidate robot and report what the physics says.

The transfer solver is a deterministic function of the bundle. It is the bundle answering a question about itself.

The drawer demo

The drawer is the cleanest worked example. Two configs ship in the repo: demo_drawer_franka.yaml and demo_drawer_allegro.yaml. Both reference the same drawer asset. Both attempt the same task. Both produce a task DNA record.

The Franka run produces task_dna_franka_rl.json with peak grip force 0.93 N, sustained 0.81 N, drawer travel 178 mm. The Allegro run produces a record with peak grip force 0.41 N, drawer travel 27 mm.

Run those two records through the transfer solver against a panel of four robots (Franka Panda, Allegro Hand, Robotiq 2F-85, Schunk PG-70) and you get the report above. Franka YES, 1.2× margin. Allegro NO, peak grip below the asset’s force_max_N. Robotiq 2F-85 borderline. Schunk YES.

The differentiated result is not a learned outcome. It is what the physics already said, surfaced in a structured form.

What “explains itself” means in practice

A typical pipeline failure in this space looks like: the policy ran, the drawer didn’t move, no one knows why. The transfer report converts that into:

Allegro Hand attempted open_bottom_wide using task_dna_franka_rl.json. Peak grip force 0.41 N. Asset’s force_max_N is 0.58 N. Verdict: no. Reason: insufficient grip force to sustain pull against joint static friction.

The reason is not a paragraph. It is a structured field. Downstream tools can index it, aggregate it across bundles, and drive a roadmap from it. “Across our bundle library, Allegro fails 60% of drawer tasks on grip force. We should commission a compliant fingertip.”

The structured reason is the gift the schema gives downstream tools. The verdict alone would not.

The peak-vs-sustainable bug we caught

The first version of the transfer solver compared the task’s force_max_N to the robot’s sustained force. This was wrong. A robot with a high peak but low sustained force would pass the check by accident, and then fail in practice when the pull phase exceeded the peak duration.

We caught the bug on the cross-robot drawer demo. Allegro showed up as YES under the old solver and NO under the corrected one. The new solver checks peak against force_max_N and separately checks sustained against the joint friction floor. Both must pass.

We also added a small noise-cleaning pass on the peak. PhysX produces single-step grip force spikes during contact transitions; the solver now requires the peak to be sustained for at least 50 ms before counting it.

The transfer solver has had two versions. Both were small. The fix in v2 was the kind of fix you only find by running the demo on a real cross-robot pair.

What the report does not promise

The simulation is not the real robot. The transfer report is a hypothesis grounded in calibrated sim physics. It is sometimes wrong. It is wrong legibly.
The robot panel is small. Four robots today. Adding a robot is bounded work: its gripper geometry and joint specs go into a panel manifest. We expect the panel to grow.
The task DNA is what it is. The solver replays an existing record. It doesn’t generate a new trajectory for the candidate robot. A more capable robot might solve a task that the bundled DNA didn’t anticipate. The transfer report doesn’t claim otherwise.

We will not promise zero-shot transfer. We will promise legible failure when transfer is not viable.

Why this is the most consequential design choice in the bundle

A bundle that ships an asset, a calibration, and a set of task DNA records but not a transfer report is, in practice, an asset library. A bundle that ships all of those plus a transfer report is, in practice, a tool. The transfer report is the field where the bundle stops being passive (a thing to download and inspect) and becomes active: a thing that answers questions about itself.

Of all the choices in the schema, this is the one we’ve changed our minds about least often.

Verdict plus reason. Not confidence. Not probability. Not score. The asset answering the question with its own physics.

Try it

If you want to run the transfer solver against your own robot panel, join the beta and we will get you set up.

This is the last post in the engineering series. The previous posts: memory layer, neural object schema, phone video to Franka, qualification gauntlet, Isaac Lab gotchas, failure as artifact, three trials, REArtGS benchmark ask.