FiberTune: Preserving Action-Fiber Visual Residuals in Vision-Language-Action Fine-Tuning

Lin, Haihao; Huang, Xiangsheng; Yang, Xiao; Zhou, Weibang; Zhang, Yiqi; Yang, Bo; Zeng, Simin; Yang, Jiawei; Wang, Zhengyang; Du, Jiahui

Abstract

Action-supervised fine-tuning of VLA policies fits demonstrations effectively but constrains only the directions that change predicted actions, leaving visual structure consistent across action-equivalent states free to collapse.

FiberTune is a training-time objective that preserves teacher-structured visual residuals without adding inference-time overhead. It uses an online action probe to estimate action-predictive feature directions, filters them from intermediate token representations, and aligns the resulting probe-filtered residuals to a frozen visual teacher while regularizing their effective rank.

Under matched training conditions, FiberTune improves over task-loss-only fine-tuning across controlled simulation settings spanning CALVIN, LIBERO, pi0.5, and OpenVLA-OFT, and improves physical SO-101 pick-place success. Residual diagnostics show increased probe-filtered residual teacher alignment and effective rank, consistent with the action-fiber motivation.

Method

Residual preservation after action-probe filtering

What FiberTune changes

Task loss directly supervises action prediction, but it does not specify how much action-equivalent visual structure should remain in intermediate representations. FiberTune adds a training-only side path that estimates action-predictive feature directions and applies the complementary filter before visual preservation.

Action probe estimates locally action-predictive directions.
Residual branch aligns centered token directions to a frozen teacher.
Effective-rank prior discourages low-rank residual collapse.
Teacher, probe, and adapter are removed after fine-tuning.

Results

Matched fine-tuning comparisons

Each comparison fixes the environment, model family, training start, data, budget, evaluator, and model-selection rule before comparing task-loss-only fine-tuning with FiberTune.

CALVIN ABC to D +10.7 pp

Five-subtask success gain for pi0.5 from the adapted start.

LIBERO 2 model families

Gains are reported for pi0.5 and OpenVLA-OFT under matched protocols.

SO-101 128 trials

Physical pick-place trials include three training colors and a held-out green block.

Physical Robot Evaluation

SO-101 pick-place with held-out color generalization

We evaluate matched policies on a physical SO-101 setup using the instruction "pick up the {color} block and put it into the black box". Yellow, orange, and purple are in-distribution colors; green is held out from training.

SO-101 color-specific start and end examples — Color-specific start and end examples from the physical setup.

SO-101 placement schedule — Shared placement schedule for each color and policy.

Diagnostics

Residual geometry tracks the behavioral gains

FiberTune increases teacher alignment and effective rank in the probe-filtered residual, while avoiding a full-token alignment pressure that can over-emphasize aggregate visual similarity.

Residual CKA and effective-rank diagnostics

Paper and Release

The paper is available on arXiv as arXiv:2606.08653. The public repository link will be added after release.

PDF arXiv BibTeX

Code

Coming soon. We will add the public repository link here after release.

BibTeX

@article{lin2026fibertune,
  title   = {FiberTune: Preserving Action-Fiber Visual Residuals in Vision-Language-Action Fine-Tuning},
  author  = {Lin, Haihao and Huang, Xiangsheng and Yang, Xiao and Zhou, Weibang and Zhang, Yiqi and Yang, Bo and Zeng, Simin and Yang, Jiawei and Wang, Zhengyang and Du, Jiahui},
  journal = {arXiv preprint},
  eprint  = {2606.08653},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  url     = {https://arxiv.org/abs/2606.08653},
  year    = {2026}
}