Robotics motion planning from clinical data

Early exploration, Dec 2025. Before narrowing to Physical AI infrastructure, we were mapping motion capture pipelines and human movement dynamics. This post is from that period.

We built a pipeline that takes raw clinical motion capture from an iPad (a patient walking, squatting, or standing on one leg) and produces structured motion data suitable for robot learning: cleaned skeleton sequences, biomechanical phase labels, ground contact events, failure detection, and exportable 3D meshes.

The goal is not to make robots move like humans. It is to give robots a structured understanding of human movement dynamics: the phases, contacts, failures, and physical constraints that govern how people actually move in clinical settings.

The gap between clinical data and robot-usable data

Clinical motion capture produces rich data: 3D joint positions at 60fps, assessment scores, normative comparisons. But this data is captured for human interpretation. A clinician reads a gait analysis report and understands what “reduced stride length on the left” means.

Robots need something different. They need:

Temporal segmentation: Which frames are stance phase? Which are swing? Where are the transitions?
Contact events: When exactly does each foot touch the ground? When does it leave?
Failure markers: At which frame did the patient lose balance? What was the biomechanical state at failure onset?
Physical constraints: What are the joint angle limits? What velocities are achievable? What accelerations are safe?
Structured export: Data in a format that motion planners, reinforcement learning systems, and physics simulators can consume

Our robotics pipeline bridges this gap with a 9-stage processing system.

The 9-stage pipeline

Raw skeleton data enters at Stage 1 and exits at Stage 9 as a validated, structured motion package.

Stage 1: Load

Extract skeleton frame data from the database and GCS storage. Each frame contains 3D positions for tracked joints (root, hips, knees, ankles, shoulders, elbows, wrists, spine, head) plus timestamps and confidence scores.

Stage 2: Clean

Filter frames by joint confidence (threshold: 0.6). Remove outlier positions (sudden jumps > 3σ from running mean). Require minimum contiguous segment duration of 0.5 seconds. Track frame retention rate; if cleaning removes > 40% of frames, the session is flagged as low quality.

Stage 3: Normalize

Transform all joint positions to a consistent world frame:

Floor at Y = 0
Root joint centered at origin
Forward direction aligned to primary movement direction
Units in meters, radians, seconds

This normalization ensures that the same movement pattern produces the same numerical representation regardless of where in the room the patient stood or which direction they faced.

Stage 4: Joint Angles

Compute clinically relevant joint angles from the 3D skeleton:

Knee flexion/extension
Hip flexion/extension
Ankle dorsiflexion/plantarflexion
Shoulder flexion
Elbow flexion
Trunk forward lean

Angles are smoothed with a configurable filter to remove high-frequency noise while preserving the true movement signal. The smoothed angles are the primary representation for downstream analysis.

Stage 5: Phase Detection

Classify each frame into a movement phase based on the assessment type:

Squat phases (detected from pelvis vertical velocity and height):

State	Entry condition
Standing	Initial state; re-entered when velocity ≈ 0 and height ≈ initial
Descent	Vertical velocity < −0.05 m/s
Bottom	Height within 2cm of minimum AND velocity within ±0.02 m/s
Ascent	Vertical velocity > +0.05 m/s

Gait phases (detected from foot-ground contact):

State	Condition
Double Support	Both feet within 5cm of floor
Stance (left)	Left foot on ground, right foot elevated
Swing (right)	Right foot elevated, left on ground

Contact is determined by foot height relative to the lowest observed foot position: contact = (foot_y − floor_y) < 0.05m

Balance phases (detected from horizontal velocity):

State	Condition
Static	Mean horizontal velocity over 10-frame window < 0.03 m/s
Unstable	Mean horizontal velocity over 10-frame window ≥ 0.03 m/s

The 10-frame sliding window provides hysteresis. Brief perturbations don’t cause rapid phase switching.

Stage 6: Contact Detection

Explicit foot-ground contact events with timestamps:

Left foot contact start/end times
Right foot contact start/end times
Double support periods
Flight phases (both feet off ground; relevant for running or jump assessments)

Contact detection uses the same height threshold as gait phase detection (5cm above floor) combined with velocity confirmation (foot velocity < 0.1 m/s during contact).

Stage 7: Failure Detection

Identify frames where biomechanical violations occur:

Loss of balance (center of mass exits support polygon)
Excessive joint angle (exceeding normative ROM limits)
Asymmetric loading (left-right force imbalance beyond threshold)
Tracking dropout (skeleton confidence drops below minimum)
Unplanned ground contact (hand or knee touching floor during standing assessment)

Each failure is tagged with: frame index, failure type, severity, and the physical state at failure onset. The failure onset frame is critical for downstream simulation. It marks the transition from normal movement to pathological movement.

Stage 8: Package Export

Assemble all outputs into a structured export:

Cleaned skeleton sequence (JSON)
Joint angle trajectories (JSON)
Phase labels per frame (JSON)
Contact events (JSON)
Failure annotations (JSON)
Optional: animated 3D mesh (GLB) and skeleton visualization

Stage 9: Validate

Cross-check consistency:

Phase transitions align with contact events (heel strike should coincide with stance phase onset)
Failure frames fall within valid frame range
Joint angles are within physically possible ranges
Frame counts match across all outputs

Technical details

Phase detection algorithms

The core of the robotics pipeline is phase detection: correctly segmenting continuous motion into discrete phases that a robot can reason about.

Squat analysis

The squat detector uses pelvis position as the primary signal. Pelvis joint is extracted by priority:

hips_joint (explicit pelvis joint)
root joint
Root transform matrix element [13] (Y-position from 4×4 transform)

Vertical velocity is computed as finite differences of the pelvis Y-coordinate:

v_y[t] = (y[t] − y[t−1]) / dt

The state machine transitions are deterministic and non-reversible within a single rep:

STANDING → DESCENT → BOTTOM → ASCENT → STANDING

A complete rep is counted when the state machine returns to STANDING. Partial reps (e.g., descent without reaching bottom) are tracked separately.

Key thresholds:

Parameter	Value	Rationale
Descent velocity	−0.05 m/s	Below natural sway (~0.02 m/s)
Ascent velocity	+0.05 m/s	Above natural sway
Static velocity	±0.02 m/s	Natural quiet standing sway
Height tolerance	0.02 m	Prevents premature bottom detection

Gait analysis

Gait phase detection relies on foot-ground contact rather than pelvis motion. The contact model is height-based:

floor_height = min(left_foot_y, right_foot_y)
left_contact = (left_foot_y − floor_height) < 0.05
right_contact = (right_foot_y − floor_height) < 0.05

This relative floor detection adapts to uneven surfaces. The floor is always defined as the lowest foot, not an absolute height.

Phase assignment:

if left_contact AND right_contact → DOUBLE_SUPPORT
elif left_contact only → LEFT_STANCE (right foot swinging)
elif right_contact only → RIGHT_STANCE (left foot swinging)
else → FLIGHT (both feet off ground)

Gait cycle metrics derived from phase labels:

Cadence: steps per minute (count phase transitions ÷ elapsed time × 60)
Stride time: duration of one complete gait cycle (left heel strike to next left heel strike)
Stance/swing ratio: percentage of cycle in stance vs swing (healthy: ~60/40)
Symmetry index: ratio of left stride time to right stride time (healthy: ~1.0)

Balance analysis

Balance phase detection uses a sliding window over horizontal velocity magnitude:

v_horizontal = sqrt(v_x² + v_z²)
window_mean = mean(v_horizontal[t−10 : t+1])
phase = "unstable" if window_mean > 0.03 else "static"

The 10-frame window (167ms at 60fps) provides temporal smoothing. The 0.03 m/s threshold was chosen empirically; it sits above quiet standing sway (typically 0.01–0.02 m/s) but below intentional weight shifting (typically > 0.05 m/s).

Balance metrics derived:

Total sway path: cumulative horizontal displacement of center of mass
Sway velocity: mean horizontal velocity over assessment duration
Time in unstable phase: percentage of assessment spent above stability threshold
Maximum excursion: peak horizontal displacement from initial position

From clinical data to robot learning

The structured output of the robotics pipeline connects to downstream robot systems via the simulation pipeline:

Stage 0 (SIM Pipeline): The RoboticsAdapter ingests the robotics pipeline output and standardizes it into the adapter format expected by the simulation pipeline.

Stages 1–4: Alignment, physical state extraction, physics labeling, and replay validation operate on the structured clinical data.

Stages 5–6: The JEPA world model trains on the clinical motion dynamics, learning the state-transition function from real patient movement.

Stages 7–8: Motion planning generates variants in latent space, physics filtering validates them, and simulation execution produces labeled outcomes.

Stage 9: Task pack extraction assembles the complete knowledge package: “given this room, this patient’s movement capability, and this assessment task, here are the valid motion trajectories, the failure modes, and the corrective actions.”

This knowledge can be consumed by:

Robot motion planners: Understanding human movement constraints to plan safe collaborative trajectories
Rehabilitation robots: Generating appropriate assistance or resistance based on patient-specific movement patterns
Digital twins: Simulating patient movement in virtual environments for remote assessment
Training data generators: Producing synthetic motion sequences with known phase labels and failure annotations for ML model training

Integration with the iOS client

The iPad app surfaces the robotics pipeline results in its assessment views:

Phase timeline: Visual display of detected phases overlaid on the skeleton playback
Contact markers: Foot-ground contact events shown as markers on the assessment timeline
Failure annotations: Red markers at detected failure frames with failure type labels
Joint angle graphs: Per-joint angle trajectories with smoothing applied
Robot visualization: When a simulation robot is registered, the patient skeleton and robot model are displayed together in a unified 3D scene

The robot visualization uses RealityKit on iPad, loading robot visual meshes and joint animations from the backend. The robot’s base pose is aligned to the patient’s environment scan, placing it in the correct position relative to the room geometry.

Evaluation

Phase detection accuracy (validated against manual annotation on 50 assessment sessions):

Assessment	Phase accuracy	Transition timing error
Squat (5xSTS)	97.2%	±1.3 frames (22ms)
Gait	94.8%	±2.1 frames (35ms)
Balance	91.5%	±3.4 frames (57ms)

Squat detection is most reliable because the vertical velocity signal is strong and unambiguous. Gait detection occasionally misclassifies brief double-support periods as single stance. Balance detection has the highest error because the static/unstable boundary (0.03 m/s) is inherently fuzzy. Some patients have natural sway that oscillates around the threshold.

Contact detection accuracy (against force plate ground truth, 20 gait sessions):

Metric	Value
Contact onset accuracy	±18ms
Contact offset accuracy	±24ms
False positive rate	2.1%
False negative rate	1.4%

The height-based contact model (5cm threshold) performs well for normal gait but degrades for shuffling gait patterns where foot clearance is minimal. A velocity-based confirmation reduces false positives at the cost of slightly delayed detection.

Limitations

Height-based contact detection: Fails for shuffling gait or foot-drag patterns where the foot never lifts > 5cm. Force plate validation is recommended for these populations.
Phase detection assumes typical movement: Highly atypical movement patterns (e.g., circumduction gait, scissoring) may not match the state machine transitions. Generic fallback detection is used but provides less specific phase labels.
Single-person tracking: ARKit body tracking supports one person. Multi-person clinical scenarios (patient + therapist) require manual exclusion of the non-target person.
No force/torque data: The pipeline infers contact from kinematics only. Ground reaction forces, joint moments, and muscle activations are not available without additional sensors.
Fixed thresholds: Phase detection thresholds are population-level constants. Patient-specific threshold adaptation would improve accuracy for individuals with unusual movement patterns.

Future work

Adaptive thresholds: Learn per-patient phase detection thresholds from the first few seconds of each assessment
Muscle activation estimation: Infer muscle activation patterns from kinematics using musculoskeletal models (OpenSim integration)
Multi-person tracking: Detect and separate therapist and patient skeletons for assisted-movement assessments
Force estimation: Estimate ground reaction forces from kinematics using physics-based inverse dynamics
Online phase detection: Real-time phase labeling during capture for live feedback to clinicians and robots

References

Perry, J. and Burnfield, J.M. “Gait Analysis: Normal and Pathological Function.” SLACK Incorporated, 2010.
Winter, D.A. “Biomechanics and Motor Control of Human Movement.” Wiley, 2009.
Podsiadlo, D. and Richardson, S. “The Timed Up & Go: A Test of Basic Functional Mobility.” Journal of the American Geriatrics Society, 1991.
Bohannon, R.W. “Sit-to-Stand Test for Measuring Performance of Lower Extremity Muscles.” Perceptual and Motor Skills, 1995.
Lugaresi, C. et al. “MediaPipe: A Framework for Building Perception Pipelines.” CVPR Workshop, 2019.
Loper, M. et al. “SMPL: A Skinned Multi-Person Linear Model.” SIGGRAPH Asia 2015.
Seth, A. et al. “OpenSim: Simulating musculoskeletal dynamics and neuromuscular control.” PLoS Computational Biology, 2018.

If you are working on clinical motion capture for Physical AI and want access to this pipeline, join the beta.