April 22, 2025 (3w ago)

AV-ALOHA: Advancing Active Vision in Robotics - Accepted to ICRA 2025

LARA Lab's groundbreaking research on active vision in robotic manipulation has been accepted to ICRA 2025

AV-ALOHA: Advancing Active Vision in Robotics - Accepted to ICRA 2025

AV-ALOHA: Pioneering Active Vision in Robotic Manipulation

We are thrilled to announce that our research paper "Active Vision Might Be All You Need" has been accepted for presentation at ICRA 2025. This work represents a significant milestone in our ongoing research at LARA Lab, where we're advancing the capabilities of robotic systems through innovative active vision approaches.

The Challenge of "Simple" Tasks

What comes naturally to humans often proves surprisingly complex for robots. As Dr. Soltani notes, "Some of the things that we take for granted as humans are so simple that we don't even pay attention to how we achieve them. We have evolved over millions of years to achieve certain capabilities. When you start to think in terms of robotics, you realize that they're quite complex."

Consider a task as straightforward as threading a needle or pouring liquid from one test tube to another. Humans instinctively adjust their viewpoint to get the best angle, a skill that has proven notoriously difficult to program into robots – until now.

Active Vision: Teaching Robots to See Like Humans

Our research introduces AV-ALOHA, a groundbreaking system that enables robots to actively control their "point of view." Unlike traditional fixed-camera systems, AV-ALOHA can dynamically adjust its perspective to gather the most relevant information for any given task.

The system comprises three key components:

  • Two robotic arms for manipulation tasks
  • A dedicated 7-DoF camera arm for active vision
  • A VR-based control interface for intuitive human demonstration

Through this setup, operators can control the manipulation arms while simultaneously adjusting the camera's viewpoint using natural head movements, all while receiving real-time visual feedback through the VR headset.

From Demonstration to Autonomy

Our research goes beyond mere teleoperation. Through imitation learning, AV-ALOHA learns not just how to manipulate objects, but also how to position its camera for optimal task performance. We conducted extensive experiments across five simulation tasks and one real-world scenario, each designed to test different aspects of active vision:

  • Peg insertion
  • Slot insertion
  • Hook package manipulation
  • Test tube pouring
  • Thread needle insertion
  • Occluded insertion (real-world)

The results were compelling: in scenarios where perspective matters most – particularly the thread needle test and occluded insertion – AV-ALOHA significantly outperformed systems with multiple fixed cameras.

Unexpected Insights

One fascinating discovery was that in tasks where varying perspectives weren't critical, the active vision system performed just as well as traditional setups with multiple fixed cameras. This suggests that a single, well-positioned camera might be sufficient for many tasks, potentially simplifying future robotic systems.

Looking Forward: The Next Phase

Our acceptance to ICRA 2025 marks not an endpoint but a milestone in ongoing research. We're currently exploring several intriguing questions:

  • How do robots balance camera movement with manipulation tasks?
  • Should perspective adjustment precede or coincide with manipulation?
  • Can we optimize head movement to mirror human behavior?

As Dr. Soltani explains, "We think that in more complex scenarios, most likely, we don't want to keep moving our heads while trying to complete a sensitive task with our hands." This insight is driving our next research phase, where we'll explore penalizing excessive head movement during precise manipulation tasks.

The Research Team

This breakthrough represents a collaborative effort between UC Berkeley and UC Davis researchers:

  • Ian Chuang* (UC Berkeley)
  • Andrew Lee* (UC Davis)
  • Dechen Gao (UC Davis)
  • M-Mahdi Naddaf-Sh (UC Davis)
  • Iman Soltani (UC Davis)

*Equal contribution

Join Us at ICRA 2025

We look forward to presenting these findings at ICRA 2025, where we'll demonstrate live system operations and discuss future applications in industrial automation, surgical robotics, and human-robot collaboration.

Learn More

Explore our research in detail:


For collaboration inquiries or more information about our research, please contact us. We look forward to engaging with the robotics research community at ICRA 2025.

Authors

LARA Lab

The Laboratory for AI, Robotics, and Automation at UC Davis

Related Research

Explore more research from our lab