Benchspan

Spring 2026 New

Run agent benchmarks in minutes, not hours

🌐 www.benchspan.com 📍 San Francisco, CA, USA 👥 2 people

B2B Engineering Product and Design

Benchspan is a benchmarking platform for AI agents. If you're building an agent, you need to know if it's getting better. But running benchmarks is slow, expensive, and fragile. You spend days writing glue code every time you want to run a new benchmark, runs take forever on your laptop, and when they fail halfway through you burn hundreds of dollars in tokens with nothing to show for it. Benchspan fixes all of it. Onboard your agent once, and it works with every benchmark on the platform. We onboarded Claude Code in 37 lines of code. Running a benchmark becomes a single command, executed in parallel in the cloud. Every result goes to one place your whole team can see, with full trajectories, token usage, latency, and custom metrics. When runs partially fail, rerun just the subset that errored instead of starting from scratch. Compare runs side by side to see exactly where your agent is improving and where it's regressing.

AI Investor Summary

Benchspan is building a platform to drastically reduce the time and cost of benchmarking AI agents, a critical need for the rapidly expanding AI development market. Led by ex-Meta and Google engineers with top-tier education, the team is well-equipped to tackle this technical challenge. The company is poised to capture a significant share of a massive and growing market.

Key Highlights

● Exceptional founder pedigree from top tech companies and universities.
● Addresses a critical and rapidly growing pain point in AI agent development.
● Strong market timing with significant tailwinds.

Risk Factors

● Early stage with limited demonstrated traction (revenue, users).
● Need to clearly articulate and build a sustainable technical moat and defensibility.
● Competition could emerge quickly from established cloud providers or other specialized AI tooling companies.

Founders

Avi Arora Founder

Avi Arora is the co-founder of Benchspan, a Y Combinator startup focused on empowering engineering teams with better data visibility. Prior to Benchspan, he held significant engineering leadership roles at prominent tech companies, demonstrating a strong track record in building and scaling complex systems. His expertise lies in software engineering, data infrastructure, and product development.

Previous: Meta (Facebook), Google

Education: Carnegie Mellon University, University of California, Berkeley

Ritesh Malpani Founder

Ritesh Malpani is the co-founder of Benchspan, a Y Combinator startup focused on providing a platform for engineering teams. Prior to Benchspan, he held engineering leadership roles at prominent tech companies, contributing to product development and team scaling. His expertise lies in building and optimizing engineering processes and infrastructure.

Previous: Meta (Facebook), Google

Education: Carnegie Mellon University, Indian Institute of Technology, Bombay (IIT Bombay)

Score Breakdown

Team 9/10

Founders have exceptional pedigree from Meta and Google, with strong academic backgrounds from CMU and IIT Bombay. Their experience in engineering leadership and infrastructure is highly relevant to building a robust benchmarking platform. This is a strong, proven technical team. [Boost +1: Founder from Google; Founder from Google]

Market 9/10

The market for AI agent development and evaluation is exploding. As AI agents become more prevalent, the need for reliable, efficient benchmarking will be critical. The TAM is vast and growing rapidly, with clear tailwinds from AI adoption. The timing is excellent.

Product 7/10

The core value proposition of reducing benchmarking time from hours to minutes is compelling. The technical differentiation lies in the platform's ability to onboard agents once and run multiple benchmarks. However, the long-term defensibility and specific technical moat need further clarity. UX quality is assumed to be good but not yet proven at scale.

Traction 4/10

As a Spring 2026 cohort company, traction is expected to be very early. The listed news items are primarily announcements and PR, not indicative of significant revenue or user growth. Investor interest is implied by YC acceptance, but concrete metrics are missing.

Last analyzed 5/6/2026

News

AIOps Startups funded by Y Combinator (YC) 2026

Benchspan is listed as an active AIOps startup funded by Y Combinator in Spring 2026, focusing on AI agent benchmarking.

ycombinator.com positive Impact: 7/10

Benchspan | Real-Time Security for AI Agents in Production

Benchspan provides a prompt injection firewall for AI agents, detecting and blocking attacks like indirect prompt injection, data exfiltration, and tool abuse.

benchspan.com positive Impact: 9/10

Benchspan - Benchspan

Benchspan is a real-time classifier that blocks prompt injection attacks aimed at AI agents, integrating with frameworks like LangChain and OpenAI Agents.

docs.benchspan.com positive Impact: 7/10

Benchspan: Real-time threat detection for AI agents in production | Y Combinator

Benchspan offers real-time security for AI agents in production, with a custom security model built for agents that detects attacks missed by generic guardrails.

Y Combinator positive Impact: 9/10

Benchspan | Real-Time Security for AI Agents in Production

Benchspan provides real-time security for AI agents in production by blocking threats inline and learning from traffic through integration with observability platforms.

Benchspan.com positive Impact: 10/10

Benchspan - Run agent benchmarks in minutes, not hours | ProductCool

Benchspan is a cloud-native platform for benchmarking and evaluating AI agents, offering standardized testing, validation, and performance tracking through massively parallel Dockerized execution and intelligent state recovery.

ProductCool positive Impact: 7/10

Benchspan: Real-time threat detection for AI agents in production | Y Combinator

Benchspan provides real-time threat detection for AI agents in production, focusing on an accurate indirect prompt injection classifier trained by former Microsoft Prompt Shields team members.

Y Combinator positive Impact: 9/10

Benchspan: Agent Benchmarks in Minutes, Not Hours

Benchspan is an agent benchmarking platform that dramatically reduces the time required for evaluations by running instances in parallel Docker containers, transforming a 14-hour SWE-bench run into minutes.

Clauday positive Impact: 8/10

Overall Score

7.4

out of 10

Team

Market

Traction

Product

Team (35%) 9

Market (25%) 9

Product (25%) 7

Traction (15%) 4

Quick Info

Batch: Spring 2026
Team Size: 2
Location: San Francisco, CA, USA
Founders: 2
Scraped: 4/10/2026

View on YC →