“SCIL: Stage-Conditioned Imitation Learning for Multi-Stage Manipulation” accepted by LCSS 2025

Our paper entitled “SCIL: Stage-Conditioned Imitation Learning for Multi-Stage Manipulation” accepted by LCSS 2025

Recent advancements in end-to-end imitation learning have shown promise in handling multi-stage robotic manipulation, which involves executing structured sequences of actions to achieve sub-goals. However, challenges arise when similar observations appear at different stages, creating ambiguity in selecting the correct action for the current sub-goal. In this letter, we propose Stage-Conditioned Imitation Learning (SCIL), a novel hierarchical imitation learning framework to tackle the ambiguity issue. SCIL consists of two key components: a high-level stage observer and a low-level stage-conditioned policy. The high-level stage observer is trained on a stage-labeled dataset using a Gated Recurrent Unit, which predicts the current task stage based on the observation and hidden state. The low-level stage-conditioned policy is trained using a conditional variational autoencoder, generating actions specifically adapted to the identified stage. SCIL facilitates precise stage transitions, effectively mitigating ambiguity between similar observations across different stages. Compared to existing end-to-end imitation learning methods such as Action Chunking with Transformers and Diffusion Policy, SCIL achieves a 65% improvement in success rate on various real-world multi-stage manipulation tasks with ambiguity between stages.