Daisy Cai of B Capital on AI Agents and the Future of Software Technology

Daisy Cai of B Capital frames the current software restacking not as an AI-driven apocalypse for engineers but as a forced evolution toward observable, secure, and composable systems—where the real bottleneck isn’t model size but the latency introduced by brittle CI/CD pipelines and overprivileged service meshes. The thesis isn’t novel: AI agents augment rather than replace human developers, but the implication for enterprise architecture is stark—if your observability stack can’t trace a LangChain agent’s tool invocation across three Kubernetes namespaces in under 200ms, you’re already behind the curve in AI security posture.

The Tech TL;DR:

AI agent orchestration introduces new attack surfaces in service-to-service auth, requiring dynamic policy engines like OPA or Cedar to prevent lateral movement via compromised tool chains.
Enterprise software restacking now hinges on reducing p99 latency in agent decision loops—benchmarks present LangGraph-based agents add 120-350ms overhead versus monolithic LLMs, making edge NPU inference critical for real-time use cases.
Funding is flowing toward platforms that unify SBOM generation with runtime agent behavior monitoring, as seen in recent Series B rounds for companies like Prompt Security and Protect AI.

Why Agent Latency Is the New Technical Debt

The restacking Cai describes isn’t about swapping Python for Rust or monoliths for microservices—it’s about inserting deterministic guardrails around non-deterministic AI agents. Consider a typical enterprise agent workflow: a user prompt triggers a retrieval-augmented generation (RAG) call to a vector database, which then invokes a proprietary API via a LangChain tool, all within a Kubernetes pod governed by Istio service mesh. Each hop adds latency and expands the attack surface. According to the Stanford HAI 2024 agent latency study, the median time for a three-tool agent chain increased by 220ms when mTLS rotation and OPA policy checks were enabled—acceptable for batch processing, lethal for fraud detection or real-time trading systems.

This is where the infrastructure layer must evolve. NPUs aren’t just for accelerating transformer inference; they’re becoming essential for offloading policy decision points. Qualcomm’s Cloud AI 100 Ultra, for instance, achieves 450 TOPS with sub-millisecond latency for rule-based evaluations—critical when your agent needs to validate a SQL query against a dynamic schema policy before execution. Benchmarks from MLPerf Inference v4.0 show that shifting policy enforcement from CPU to NPU reduces end-to-end agent latency by 37% in RAG-heavy workloads, directly addressing the “AI tax” Cai warns against.

The Security Tax of Composable Agents

Composability increases risk exponentially. A single compromised tool in an agent’s arsenal can develop into a pivot point—feel of it as Log4Shell, but for the agent ecosystem. The CVE-2024-21626 in a popular LangChain SQL toolkit, which allowed arbitrary query execution via prompt injection, wasn’t caught by SAST tools because the vulnerability lived in the dynamic prompt construction, not the static code. Runtime agent behavior monitoring is no longer optional. As

“We’re seeing agent toolchains become the new privilege escalation vector—if your SBOM doesn’t include the exact version of every Hugging Face model and third-party API your agent calls, you’re flying blind.”

View this post on Instagram about Security, Agent

From Instagram — related to Security, Agent

— Lena Torres, Lead Security Engineer, Prompt Security (Series B: $32M led by Lightspeed Venture Partners)

This connects directly to the require for integrated SBOM and runtime observability platforms. Tools like Syft for SBOM generation and Trivy for vulnerability scanning must now operate in tandem with agent-specific runtime shields. Protect AI’s Recon platform, for example, maps agent tool calls to MITRE ATLAS framework techniques in real time, flagging anomalous sequences like sudden shifts from data retrieval to file system enumeration—a telltale sign of agent compromise.

Implementation: Hardening Agent Tool Chains

Here’s a practical mitigation: enforce least-privilege tool access via Open Policy Agent (OPA) integrated with your agent framework. The following Rego policy blocks any tool invocation that attempts to access environment variables—a common exfiltration path in agent compromises:

package agent.tool.auth deny[msg] { input.tool.name == "run_shell" input.tool.args[_] = "printenv" msg = sprintf("Blocked shell tool attempting to read env vars: %v", [input.tool.args]) } allow { not deny input.tool.name == "run_shell" input.tool.args[_] != "printenv" }

Deploy this as a sidecar policy agent alongside your LangGraph or LlamaIndex workflow. When integrated with a service mesh like Linkerd, it can enforce decisions at the sidecar proxy level, reducing decision latency to <50ms—well within the threshold for real-time agent interactions. For teams using managed Kubernetes, this pattern is already being adopted by fintech firms working with cloud architecture consultants to harden AI workloads without sacrificing velocity.

The funding trajectory confirms where the puck is going. Per AI Security Intelligence’s Q1 2026 report, 38% of the $8.5B+ in AI security funding is now directed toward runtime agent behavior monitoring and dynamic authorization—up from 12% in 2023. This isn’t speculative; it’s a direct response to observed incidents where agent tool chains were abused in supply chain attacks targeting CI/CD pipelines, as documented in the CISA AA24-095A alert on compromised ML model registries.

Directory Bridge: From Theory to Triage

Enterprises adopting AI agents at scale are hitting two walls: observability blindness and policy latency. The first requires correlating agent traces with traditional APM data—something firms like application performance monitoring specialists are now packaging as “AI Observability Bundles.” The second demands policy decision points closer to the data—where edge computing integrators deploy NPU-accelerated OPA sidecars at the ingress of agent-facing APIs. And when the inevitable misconfiguration occurs? incident response teams with specific LLM forensics tooling are seeing a 300% YoY increase in retainer requests, per Q1 2026 data from SANS Institute.

The restacking isn’t about choosing between AI and human developers—it’s about building systems where neither can compromise the other’s integrity. The winners won’t be those with the largest models, but those who ship the tightest feedback loops between agent behavior, policy enforcement, and runtime security—measured in milliseconds, not marketing slides.

“The next wave of software engineering isn’t prompt crafting—it’s policy engineering. If you can’t express your security boundaries as code that runs in under 10ms beside your agent, you’re not building software; you’re building liabilities.”

— Marcus Chen, CTO, Protect AI (Backed by Evolution Equity Partners and 10x Capital)

As enterprise AI shifts from experimentation to production, the restacking Cai describes will be judged not by benchmark scores on MMLU, but by mean time to detect (MTTD) and mean time to contain (MTTC) agent-related incidents. The directory isn’t just a list of vendors—it’s the triage network for when the agent goes off-script.

*Disclaimer: The technical analyses and security protocols detailed in this article are for informational purposes only. Always consult with certified IT and cybersecurity professionals before altering enterprise networks or handling sensitive data.*

Daisy Cai of B Capital on AI Agents and the Future of Software Technology

Why Agent Latency Is the New Technical Debt

The Security Tax of Composable Agents

Implementation: Hardening Agent Tool Chains

Directory Bridge: From Theory to Triage

Share this:

Related