Product & Updates

Approaching Human-level Intelligence on Online-Mind2Web

Approaching Human-level Intelligence on Online-Mind2Web

Image

Today, Pinetree Research announces a new milestone: Pinetree Agent achieves 90% on Online-Mind2Web, establishing leading performance on one of the most rigorous benchmarks for autonomous web generalization.

Online-Mind2Web evaluates whether agents can understand user intent, interpret changing interfaces, and complete real-world tasks across diverse live websites. Unlike narrow scripted benchmarks, it measures adaptability under dynamic operating conditions.

At 90%, Pinetree Agent substantially outperforms frontier systems including OpenAI Lux (83.6%), Yutori Navigator (78.7%), and Gemini 2.5 CUA (69%). This margin indicates major gains in planning, execution reliability, and cross-site generalization.

Why Online-Mind2Web Matters

Online-Mind2Web is designed to test capabilities required for practical computer use:

  • Intent understanding

  • Adaptation to unfamiliar layouts

  • Multi-step task execution

  • Robust decision-making under uncertainty

  • Reliable completion across changing environments

Strong performance on this benchmark suggests that autonomous systems are rapidly progressing from narrow automation toward general digital competence.

Pinetree Research

Engineering the Future of Autonomous Intelligence

© 2026 Pinetree Research

All rights reserved

Pinetree Research

Engineering the Future of Autonomous Intelligence

© 2026 Pinetree Research

All rights reserved

Pinetree Research

Engineering the Future of Autonomous Intelligence

© 2026 Pinetree Research

All rights reserved