
Product & Updates

Today, Pinetree Research announces a new milestone: Pinetree Agent achieves 90% on Online-Mind2Web, establishing leading performance on one of the most rigorous benchmarks for autonomous web generalization.
Online-Mind2Web evaluates whether agents can understand user intent, interpret changing interfaces, and complete real-world tasks across diverse live websites. Unlike narrow scripted benchmarks, it measures adaptability under dynamic operating conditions.
At 90%, Pinetree Agent substantially outperforms frontier systems including OpenAI Lux (83.6%), Yutori Navigator (78.7%), and Gemini 2.5 CUA (69%). This margin indicates major gains in planning, execution reliability, and cross-site generalization.
Why Online-Mind2Web Matters
Online-Mind2Web is designed to test capabilities required for practical computer use:
Intent understanding
Adaptation to unfamiliar layouts
Multi-step task execution
Robust decision-making under uncertainty
Reliable completion across changing environments
Strong performance on this benchmark suggests that autonomous systems are rapidly progressing from narrow automation toward general digital competence.