Back to Projects

Long Horizon
VLA

Integrating Large Language Models (LLMs) with Visual-Language-Action (VLA) models like GR00T to enable reasoning for complex robotic tasks.

VLA Models LLMs Robotics

GR00T Foundation Model

01. The Challenge

Current Visual-Language-Action (VLA) models excel at short, atomic actions (e.g., "pick up the cup"). However, they struggle significantly with long-horizon tasks that require multi-step reasoning, memory, and contextual awareness over time In dynamic environments.

This research hypothesized that integrating a Large Language Model (LLM) as a high-level "reasoner" could guide the VLA policies, breaking down complex instructions into manageable sub-goals.

02. Technical Approach

Hierarchy

We implemented a layered architecture:

1. High Level LLM planner decomposes task
2. Low Level VLA executes atomic actions

Experimentation

Systematic evaluation on a Franka Emika Research 3 arm.

Baselines: GR00T (Zero-shot)
Task: Multi-stage pick and place
Metrics: Success Rate, Sub-goal completion

Outcomes

The study provided critical insights into the "reasoning gap" in current VLA models.

We demonstrated that LLMs improve success rates in tasks requiring logical ordering of actions, but latency and grounding remain key challenges.

Research Context

Affiliation Eurecat - Robotics & Automation

Focus Embodied AI, Foundation Models

Period Feb 2025 – Oct 2025