This work presents an intent-driven planning pipeline for flexible multi-robot manipulation. The system converts operator instructions and scene descriptions into precedence-aware action sequences, then uses an ensemble of large language models, an LLM verifier, and deterministic consistency checks to reduce invalid plans before execution.
Planning Pipeline
- A perception-to-text stage converts the workcell state into a structured object-level scene description.
- Multiple LLM planning candidates are generated from the same operator intent, allowing the system to compare and filter action sequences.
- A verifier checks formatting and precedence constraints before a deterministic filter removes plans that mention objects outside the detected scene.
Experimental Highlights
- The evaluation used 200 real scenes and 600 operator prompts across five component classes.
- Ensemble planning with verification improved full-sequence correctness compared with a single-model baseline.
- Human-interface tests evaluated time to execution and NASA TLX workload for natural-language task requests.
Why It Matters
Battery disassembly and other variable industrial tasks need robots that can respond to high-level operator intent without hard-coded task scripts. This approach keeps the planning output explicit and inspectable while using LLMs to make multi-robot coordination more flexible.