The 16-test gauntlet every asset has to pass
A neural object is only as trustworthy as the tests it has survived. We run sixteen of them. They are deterministic, fast, and unforgiving. Every one of them was added because a bundle without it once shipped broken.
This post walks the gauntlet: what each test checks, which bundles have failed which test, and what we do when a borderline asset slips through.
The shape of the gauntlet
The tests fall into five phases. Each phase gates the next. A bundle that fails T3 never reaches T4.
We didn’t start with sixteen tests. We started with three. The current shape was carved by debugging real failures.
Phase 1 — Geometry (T1–T3)
The cheapest tests. They run in milliseconds. They catch the failures that, if left, waste the next forty minutes of pipeline time.
- T1 — Watertight check. Every visual mesh must be a closed manifold. Trimesh
is_watertightplus a hole-detection pass. Failed twice in the lighter family in the early days, both times because the procedural trigger overlay left a 0.3 mm seam at the joint boundary. - T2 — Face count bounds. Below 200 faces and the asset is degenerate. Above 200,000 and the downstream collision generator chokes. We chose 200–200,000 after measuring real failure thresholds.
- T3 — Degeneracy and proportions. Aspect ratio must be plausible for the family. The lighter must be longer than wide than tall. Catches mirrored or swapped-axis exports.
Phase 2 — Topology and Scale (T4–T6)
The tests that catch the silent failures. They are the ones we’re most proud of.
- T4 — Single dominant body. Visual mesh decomposes into bodies; the largest must dominate (>0.85 of total volume). Catches cases where the reconstruction left a stray ghost mesh.
- T5 — Spec-versus-mesh dimension diff. Parses dimensions out of
tier1_asset_spec.color_material_hintsand compares to mesh extents. Tolerance is 25% per axis. This test caught a real failure on the production lighter bundle: extents 5.4 × 10.0 × 10.0 cm versus spec 27.5 × 4.0 × 3.5 cm. SF3D had reconstructed the head only and dropped the long handle. The bundle silently passed every other test we had at the time. T5 was the answer. - T6 — Watertight after collision decomposition. Ensures the collision mesh, after
STRATEGY_THIN_DECOMPOSITION, still fully encloses the visual mesh. Catches collision-mesh undercoverage at the part-boundary recesses.
T5 is the one we tell new engineers to study first. It is half a screen of code. It is the most valuable half-screen in the qualification suite.
Phase 3 — Physics Authoring (T7–T10)
- T7 — Mass within plausible range for the family. A 4 kg cabinet drawer is plausible. A 40 kg cabinet drawer is not. We carry a per-family mass envelope and reject outside it.
- T8 — Friction coefficients in physical range. Static in [0.1, 1.5], dynamic in [0.05, 1.2], static ≥ dynamic. Catches the recurring transcription error of swapped values.
- T9 — Inertia tensor positive definite. Exactly what it says. Caught a numerically degenerate inertia in one early hybrid object.
- T10 — Joint static friction nonzero on articulated joints. A drawer joint with zero static friction free-slides under any contact. Set 2.0 N as a deterministic floor.
Phase 4 — Articulation Integrity (T11–T14)
This phase exists because Isaac Lab is unforgiving about USD authoring details that, on paper, shouldn’t matter.
- T11 —
UsdPhysics.ArticulationRootAPIpresent on the articulation root. Missing this once corrupted a Franka training run silently. The asset loaded, the policy trained, the rewards looked right, the drawer never moved. The test was added the next day. - T12 —
kinematicEnabledisFalseon articulation children. AkinematicEnabled=Truebody inside an articulation crashes Isaac Lab on env construction. We default tofixedBase=Trueon the root instead. - T13 — Joint axis matches family expectation. The drawer slides along
x. The lighter trigger rotates aroundz. We keep a per-family canonical axis. Misalignment over 15° fails. - T14 — Joint limits non-degenerate. Limits where
lower == upperare common transcription errors. Caught twice in the controller family.
Phase 5 — Manipulation Fitness (T15–T16)
The last two tests are the most opinionated. They are the tests that say “this asset is not just valid, it is useful.”
- T15 — Collision fidelity at part boundaries. Measure the gap between the visual mesh and the collision mesh at every part-boundary recess. Strict threshold: 5 mm. This test drove a pipeline fix. Before T15,
STRATEGY_THIN_DECOMPOSITIONallowed only 1 hull per part at a 0.30 voxel threshold. Drawer parts had 25 mm air gaps. The fix: allow 4 hulls per part at 0.05. Drawer parts now have 3–7 mm gaps. Four of five drawers pass strict 5 mm. The fifth (BottomWide) is borderline at 6.76 mm. - T16 — Grasp clearance. For every grasp annotation, verify that the gripper geometry, at approach pose, does not penetrate the visual mesh by more than 1 mm. Catches grasps that look correct in
grasps.jsonbut cannot actually be reached.
What “borderline” means
The BottomWide drawer at 6.76 mm fails strict T15. We ship it anyway, with the failure recorded explicitly in the bundle’s qualification.json and a note in the manifest.
This is the right shape for a qualification system. Strict thresholds are how you prevent silent regression. Explicit waivers are how you keep the system honest about which bundles you have shipped despite a known issue. We refuse to lower thresholds to make a bundle pass; we refuse to silently un-ship a bundle that fails.
Tests should not be flexible. Decisions to ship anyway should be loud.
What we have caught
- The lighter scale failure (T5)
- A controller
top_bridgeoverlap (T6) - Two missing
ArticulationRootAPIannotations (T11) - A drawer with
kinematicEnabled=Truethat would have crashed Isaac Lab (T12) - Three swapped friction values (T8)
- Five collision-fidelity gaps that, once surfaced, drove the
STRATEGY_THIN_DECOMPOSITIONfix (T15)
Each of these would have shipped silently before the corresponding test existed. The gauntlet is the difference between “we think this works” and “we know what fails.”
What we haven’t caught yet
- We don’t yet have a test for affordance consistency. The affordance experiment in this series (Failure as a first-class artifact) shows the pattern. T17 is on the roadmap.
- We don’t yet have a test for transfer-report freshness, i.e., whether the report was regenerated after the last task DNA edit. Schema validators catch this when the bundle round-trips, but a faster pre-commit hook is on the roadmap.
- We don’t yet have multi-state articulation tests. The REArtGS-style benchmark is small (one lighter, one drawer) and not standardized. Until it is, T18 is unwritten.
Cost
The full gauntlet runs in about 14 seconds on the drawer bundle. It runs in CI on every bundle commit. It is cheaper than the test we used to run, which was “wait twenty minutes and see if Isaac Lab loads it.”
Next: Why your “kinematic body” breaks Isaac Lab articulations: four small USD authoring details that cost us hours each.
