Agent Edge | May 26, 2026
ποΈ SAP launched 200+ AI agents with Claude in the Autonomous Enterprise
The Agent Report
π https://the-agent-report.com/2026/05/sap-autonomous-enterprise-200-agents/
At SAP Sapphire 2026, SAP announced the Autonomous Enterprise, a full re-architecture around AI agents. Over 50 domain-specific Joule Assistants orchestrate 200+ specialized agents across Finance, Spend Management, Supply Chain, HR, and Customer Experience. The three-layer stack includes the SAP Business AI Platform as foundation, the SAP Autonomous Suite as the agent layer, and the SAP AI Agent Hub for governance. The SAP Knowledge Graph maps 7 million data fields to provide agent context across the entire enterprise. The Autonomous Close Assistant compresses financial close from weeks to days by having agents check each other’s work and surface only exceptions. SAP runs ERP for over 400,000 enterprises globally, making this the largest production agent deployment in enterprise software.
π Why it matters: SAP is validating agents as the core architecture for enterprise software, not as an experimental add-on. The Knowledge Graph pattern solves the context problem that kills most agent deployments before they reach production. Governance as a built-in layer sets the standard every enterprise platform will need to match. The three-layer stack offers a reference architecture that any enterprise agent builder can use. This deployment proves that agents scale horizontally across every business function when the foundation is right.
π€ Agent angle: The SAP playbook serves as the enterprise pitch template for agent builders. Every client conversation should reference the three-layer stack as the minimum viable architecture for enterprise agents. Build the governance wrapper before clients ask for it. The Knowledge Graph pattern is the differentiator that separates production agents from prototypes that cannot survive in the real enterprise environment.
π© Paul Graham warns AI-generated startup emails go straight to the trash
@paulg | X/Twitter
π https://x.com/paulg/status/2058844147092488401
Paul Graham publicly stated that AI-generated emails from founders now go straight into his mental trash folder. He identified a hard-hitting journalistic style as the telltale sign of AI writing. No founder ever wrote this way before language models became widely available. Once a reader realizes a message is AI-written, it becomes difficult to take it seriously. The pattern he described is that the tool which gave founders an edge in volume has now flipped to a liability in perception.
π Why it matters: Investor pattern-matching against AI writing is a leading indicator for the broader outbound market. Anyone sending AI-generated messages to sophisticated recipients is actively harming their own credibility. The market is shifting from pure AI generation to AI-assisted personalization with human delivery. This dynamic will extend beyond investors to every inbox where discernment lives. Founders who miss this signal will find their messages filtered before they are read.
π€ Agent angle: The right approach is a two-tier workflow that uses agents for research and personalization but leaves the actual writing to humans. Agent output belongs in the research toolchain, not the inbox. Build a filter that flags AI-written copy before it reaches the send button. The agents that survive this shift are the ones that know where to stop generating and hand off to a human operator.
π€ Hermes Agent beats OpenAI’s Codex CLI on multi-turn benchmarks despite running in Python
r/hermesagent | Reddit
Hermes Agent now outperforms OpenAI’s Codex CLI on most multi-turn agent tasks despite running in Python rather than Rust. New performance tweaks landed that changed the competitive landscape of open-source agent frameworks. The results challenge the assumption that lower-level compiled languages have an inherent performance advantage in agent workloads. Developers can now focus on architecture and prompt design rather than rewriting in Rust for speed. The Python ecosystem’s library depth becomes an advantage rather than a tradeoff.
π Why it matters: This lowers the barrier for open-source agent frameworks by demonstrating that language choice is not the binding constraint on agent performance. It debunks the compiled language tax argument that has discouraged builders from using Python for agent infrastructure. The community can now evaluate frameworks on their architecture and benchmark results rather than their implementation language. Python remains the most accessible language for the widest pool of agent builders.
π€ Agent angle: Evaluate agent frameworks on multi-turn benchmarks, not on the language they are written in. Test your own workloads on Hermes Agent, especially for tasks that require five or more turns. The performance gap between Python and Rust agents is narrower than most builders assume. Run your own benchmarks before choosing a framework based on language hype. The framework that wins on your specific workload is the right choice regardless of its implementation language.
π Best models for Hermes agents May 2026: Gemini 3.1 Flash Lite leads, frontier models embarrass themselves
r/hermesagent | Reddit
The May 2026 model benchmarks for Hermes agent workloads tested 19 models across 25 real tasks including tool calling, multi-step reasoning, and failure recovery. Gemini 3.1 Flash Lite won with a 100% pass rate, 3.6-second latency, and $0.02 per run. Gemini 2.5 Flash placed a close second. Frontier models performed surprisingly poorly: Claude Opus 4.7 ranked 13th and GPT-5.5 ranked 10th. DeepSeek V4 Flash achieved a 92% pass rate at just $0.006 per run, making it the cost leader. On agent workloads, frontier reasoning capability is mostly wasted bandwidth better reserved for only the hardest tasks.
π Why it matters: Cheaper and faster models are consistently beating frontier models on agent-specific tasks. This creates a structural advantage for builders who optimize for the right metrics rather than model name recognition. The price-performance ratio of models like Gemini 3.1 Flash Lite and DeepSeek V4 Flash changes the economics of agent deployment entirely. Spending on frontier models for routine agent tasks is now a measurable waste of capital. The data says smaller, faster models are the right default for agent pipelines.
π€ Agent angle: Test Gemini 3.1 Flash Lite and DeepSeek V4 Flash on your agent pipelines this week. You can achieve 50 to 100 times cost reduction with better or equivalent performance on most tasks. Run the benchmark harness on your own workloads rather than trusting aggregate rankings. Reserve frontier models for the subset of tasks that genuinely need their reasoning depth. The builders who optimize for the right model for each task will outspend their competitors by an order of magnitude while delivering better results.
π§© FigMirror open-source AI agent generates publication-quality plots in any paper’s figure style
VILA-Lab/FigMirror | GitHub
π https://github.com/VILA-Lab/FigMirror
FigMirror is an open-source AI agent that generates publication-quality plots matching any paper’s visual style from your data and a reference figure. It uses an iterative Drawer and Reviewer loop where the Drawer generates matplotlib code and the Reviewer evaluates the output against the reference until the style matches. The project ships as a local web UI, a Codex skill, and a Claude Code skill. It has 306 GitHub stars and is licensed under Apache 2.0. For teams that produce regular visual reports, this tool pays for itself in the first week.
π Why it matters: FigMirror compresses figure creation from hours to minutes, removing a major bottleneck in research publishing and client reporting. Research figures are among the highest-leverage bottlenecks in academic publishing and client-facing data work. Automating figure styling eliminates the repetitive formatting work that consumes disproportionate time. Every consultancy and research group that produces visual deliverables should evaluate this tool immediately.
π€ Agent angle: Add FigMirror as a skill in your agent stack for client-facing data work. The workflow is straightforward: the client shares a reference figure, the agent fetches the data, FigMirror generates the figures, and the agent delivers polished outputs on the same call. This delivers a fivefold speedup on a production bottleneck that currently consumes hours of manual work per deliverable. The iterative Drawer and Reviewer loop mirrors the same quality control pattern your team already uses manually.