Product & Updates

Pinetree Agent: Approaching Human-level Intelligence on Online-Mind2Web

Pinetree Agent: Approaching Human-level Intelligence on Online-Mind2Web

Image

Today, Pinetree Research announces a new milestone: Pinetree Agent achieves 90% on Online-Mind2Web, establishing leading performance on one of the most rigorous benchmarks for autonomous web generalization.

Online-Mind2Web evaluates whether agents can understand user intent, interpret changing interfaces, and complete real-world tasks across diverse live websites. Unlike narrow scripted benchmarks, it measures adaptability under dynamic operating conditions.

At 90%, Pinetree Agent substantially outperforms frontier systems including OpenAI Lux (83.6%), Yutori Navigator (78.7%), and Gemini 2.5 CUA (69%). This margin indicates major gains in planning, execution reliability, and cross-site generalization.

Why Online-Mind2Web Matters

Online-Mind2Web is designed to test capabilities required for practical computer use:

  • Intent understanding

  • Adaptation to unfamiliar layouts

  • Multi-step task execution

  • Robust decision-making under uncertainty

  • Reliable completion across changing environments

Strong performance on this benchmark suggests that autonomous systems are rapidly progressing from narrow automation toward general digital competence.