Benchspan
Spring 2026 NewRun agent benchmarks in minutes, not hours
Benchspan is a benchmarking platform for AI agents. If you're building an agent, you need to know if it's getting better. But running benchmarks is slow, expensive, and fragile. You spend days writing glue code every time you want to run a new benchmark, runs take forever on your laptop, and when they fail halfway through you burn hundreds of dollars in tokens with nothing to show for it. Benchspan fixes all of it. Onboard your agent once, and it works with every benchmark on the platform. We onboarded Claude Code in 37 lines of code. Running a benchmark becomes a single command, executed in parallel in the cloud. Every result goes to one place your whole team can see, with full trajectories, token usage, latency, and custom metrics. When runs partially fail, rerun just the subset that errored instead of starting from scratch. Compare runs side by side to see exactly where your agent is improving and where it's regressing.
AI Investor Summary
Benchspan is building a platform to drastically reduce the time and cost of benchmarking AI agents, a critical need for the rapidly expanding AI development market. Led by ex-Meta and Google engineers with top-tier education, the team is well-equipped to tackle this technical challenge. The company is poised to capture a significant share of a massive and growing market.
Key Highlights
- ● Exceptional founder pedigree from top tech companies and universities.
- ● Addresses a critical and rapidly growing pain point in AI agent development.
- ● Strong market timing with significant tailwinds.
Risk Factors
- ● Early stage with limited demonstrated traction (revenue, users).
- ● Need to clearly articulate and build a sustainable technical moat and defensibility.
- ● Competition could emerge quickly from established cloud providers or other specialized AI tooling companies.
Founders
Avi Arora is the co-founder of Benchspan, a Y Combinator startup focused on empowering engineering teams with better data visibility. Prior to Benchspan, he held significant engineering leadership roles at prominent tech companies, demonstrating a strong track record in building and scaling complex systems. His expertise lies in software engineering, data infrastructure, and product development.
Ritesh Malpani is the co-founder of Benchspan, a Y Combinator startup focused on providing a platform for engineering teams. Prior to Benchspan, he held engineering leadership roles at prominent tech companies, contributing to product development and team scaling. His expertise lies in building and optimizing engineering processes and infrastructure.
Score Breakdown
Founders have exceptional pedigree from Meta and Google, with strong academic backgrounds from CMU and IIT Bombay. Their experience in engineering leadership and infrastructure is highly relevant to building a robust benchmarking platform. This is a strong, proven technical team. [Boost +1: Founder from Google; Founder from Google]
The market for AI agent development and evaluation is exploding. As AI agents become more prevalent, the need for reliable, efficient benchmarking will be critical. The TAM is vast and growing rapidly, with clear tailwinds from AI adoption. The timing is excellent.
The core value proposition of reducing benchmarking time from hours to minutes is compelling. The technical differentiation lies in the platform's ability to onboard agents once and run multiple benchmarks. However, the long-term defensibility and specific technical moat need further clarity. UX quality is assumed to be good but not yet proven at scale.
As a Spring 2026 cohort company, traction is expected to be very early. The listed news items are primarily announcements and PR, not indicative of significant revenue or user growth. Investor interest is implied by YC acceptance, but concrete metrics are missing.
News
Benchspan is listed as an active AIOps startup funded by Y Combinator in Spring 2026, focusing on AI agent benchmarking.
Benchspan provides a prompt injection firewall for AI agents, detecting and blocking attacks like indirect prompt injection, data exfiltration, and tool abuse.
Benchspan is a real-time classifier that blocks prompt injection attacks aimed at AI agents, integrating with frameworks like LangChain and OpenAI Agents.
Benchspan offers real-time security for AI agents in production, with a custom security model built for agents that detects attacks missed by generic guardrails.
Benchspan provides real-time security for AI agents in production by blocking threats inline and learning from traffic through integration with observability platforms.
Benchspan is a cloud-native platform for benchmarking and evaluating AI agents, offering standardized testing, validation, and performance tracking through massively parallel Dockerized execution and intelligent state recovery.
Benchspan provides real-time threat detection for AI agents in production, focusing on an accurate indirect prompt injection classifier trained by former Microsoft Prompt Shields team members.
Benchspan is an agent benchmarking platform that dramatically reduces the time required for evaluations by running instances in parallel Docker containers, transforming a 14-hour SWE-bench run into minutes.
Quick Info
- Batch
- Spring 2026
- Team Size
- 2
- Location
- San Francisco, CA, USA
- Founders
- 2
- Scraped
- 4/10/2026