Advancing the limits of autonomous computer use.

State-of-the-art benchmark results across the industry’s most demanding evaluations for browser reasoning and multi-step execution.

WebVoyager is one of the most recognized benchmarks for autonomous web agents, measuring performance on real browser tasks across 15 live websites and more than 600 multi-step tasks. Agents must reason, navigate changing interfaces, and complete open-ended objectives under real web conditions.


Pinetree Agent achieves 97%, setting frontier-level performance on one of the field’s most demanding browser-use evaluations. At this level, failure rates become exceptionally low, indicating near-production reliability for complex web workflows.

Online-Mind2Web is a large-scale benchmark for measuring how effectively agents understand intent, interpret live web interfaces, and execute real online tasks across diverse environments. Unlike static evaluations, it tests generalization under changing layouts and open-ended objectives.


Pinetree Agent achieves 90%, leading competing systems and demonstrating advanced robustness in real-world web interaction. Results at this level suggest strong transferability beyond narrow scripted tasks.