Agent Edge | June 02, 2026

Agent Edge

June 2, 2026·7 min read

📦 Science Superpowers brings composable computational-science skills to AI research agents

K-Dense-AI/science-superpowers | GitHub

🔗 https://github.com/K-Dense-AI/science-superpowers

Science Superpowers is a framework of composable computational-science methodology skills for AI research agents, reimagining the Superpowers pattern for the science domain. The repository ships skills organized around pre-registration, hypothesis framing, methodology design, and reproducible experiment workflows. It includes plugins for Claude Code, Codex CLI, Cursor, and OpenCode, along with hooks and scripts that enforce structured research practices. At 170 GitHub stars in its first week, the project is already gaining traction among researchers who want their agents to follow rigorous scientific methodology rather than guessing at experimental design. The skill files are modular, letting you compose research-benchmarking, hypothesis-testing, and data-analysis capabilities into a single agent.

📌 Why it matters: Most agent frameworks optimize for coding speed, not research rigor. Science Superpowers fills the gap for anyone using agents for scientific work, where reproducibility and methodology are as important as output. The pre-registration-over-TDD approach means your agent defines its hypothesis and experiment design before touching data, reducing p-hacking and confirmation bias. For researchers and builders in computational science, this is the infrastructure that turns a fast-coder agent into a trustworthy research collaborator.

🤖 Agent angle: Clone the repo and load the skill files into your agent’s working directory. The pre-registration skills are the standout: they force your agent to document its hypothesis, method, and success criteria before executing. Start with the hypothesis-framing skill to structure your next experiment, then add methodology-design for the experimental protocol. The repo’s hooks/ directory includes pre-commit validators that check for reproducibility requirements.

⚡ How automatelab.tech made itself agent-ready with WebMCP

AutomateLab Editorial | AutomateLab Blog

🔗 https://automatelab.tech/blog/al-products/how-we-made-automatelab-tech-agent-ready-webmcp/

AutomateLab published a detailed case study of wiring their entire site to AI agents using WebMCP, the browser-native API that lets pages register typed tools for in-browser agents. They took the imperative path with navigator.modelContext.registerTool(), exposing eight free tools plus blog and dataset search as callable functions. The integration was roughly 40 lines in a shared /tools/webmcp.js helper, because every tool already had a compute function. The post is candid about adoption: WebMCP ships only in Chrome 146 behind a feature flag, real-world usage is near zero, and most production agents still read the raw DOM. Google’s Lighthouse 13.3 now includes an Agentic Browsing audit that checks for WebMCP tools, llms.txt, accessibility-tree integrity, and layout stability. The WebMCP check is informational only, but the other three checks pay off today regardless of WebMCP adoption.

📌 Why it matters: This is one of the first real WebMCP case studies from a production site, complete with honest tradeoffs. The key insight is that WebMCP is cheap to add if your site already has working actions, and Google is signaling it will matter for agent-driven discovery. The post also connects WebMCP to the broader agent-readiness stack (accessibility, llms.txt, CLS) which are worth doing regardless. For builders running agent-driven services, this is a playbook for making your own tools agent-callable.

🤖 Agent angle: If you run a web service or SaaS, follow the same pattern: expose your existing search, audit, and data tools through navigator.modelContext. The imperative path is cleaner for tools that already have compute functions. Start with your most-used tool, add the registration call, and verify it shows up in Lighthouse’s Agentic Browsing report. Even if no agent calls it today, the infrastructure is zero-maintenance once wired.

🟣 NVIDIA open-sources Cosmos 3, an 8B and 32B physical AI omnimodel

@NVIDIAAI | X/Twitter

🔗 https://x.com/NVIDIAAI/status/2061308434629132553

NVIDIA released Cosmos 3, a fully open physical AI omnimodel that combines vision reasoning, world simulation, and action generation within a single architecture. The release includes Super (32B) and Nano (8B) variants, along with open weights, code, and training datasets on HuggingFace and GitHub. Cosmos 3 is not just a vision-language model or a video generator: it handles language, images, video, audio, and actions in a unified framework, and can simulate future worlds, predict actions, and generate robot policies. NVIDIA positions it as the first fully open model for Physical AI that can perceive, simulate, and act. Early benchmarks show it leads open-weight models across reasoning, text-to-image, image-to-video, and robot policy tasks. The project page, code, and model weights are all publicly available under an open license.

📌 Why it matters: A single open model that handles vision reasoning, world simulation, and action generation means agent builders can deploy one model instead of stitching together a vision model, a simulator, and a policy network. The 8B Nano variant makes physical AI accessible on consumer hardware. For agents that need to operate in or reason about physical space, Cosmos 3 eliminates the integration tax of combining separate models. The open training datasets also let you fine-tune for specific physical domains.

🤖 Agent angle: Download the Nano variant and point your agent at it for any task involving physical reasoning, from robotics simulation to video understanding. The key capability is the unified action generation: your agent can take a video input, reason about the physical scene, and output a robot policy or simulation state in one call. Start with the HuggingFace collection to explore the available model variants and check the GitHub repo for inference examples.

🧠 Anthropic expands Project Glasswing to 200 organizations

@rohanpaul_ai | X/Twitter

🔗 https://x.com/rohanpaul_ai/status/2061810522962776345

Anthropic is expanding Project Glasswing, its preview of the exploit-generating Claude Mythos model, to 200 vetted organizations. The program gives security teams access to an AI agent that autonomously discovers and generates exploits for vulnerabilities in their systems. A general release is expected in two weeks, according to reports. Claude Mythos, which Anthropic unveiled earlier this year, uses reinforcement learning from automated red-teaming feedback to generate novel attack patterns. The expanded preview suggests Anthropic is confident in the model’s safety controls and sees a commercial market for offensive-security-as-a-service agents. The model operates within defined scope constraints and generates proof-of-concept exploits rather than weaponized payloads.

📌 Why it matters: This is the first time an offensive-security AI agent is moving toward general availability through a tiered safety release. For security-focused agent builders, Glasswing validates that there is a paying market for specialized agent services that autonomously find vulnerabilities. The two-week countdown to GA means the window for building complementary services around Mythos is short. The safety infrastructure Anthropic built to prevent misuse of an exploit-generating model will likely become a reference pattern for other high-risk agent deployments.

🤖 Agent angle: If you build security agents, study the Glasswing release model: begin with a vetted preview, enforce scope constraints in the agent’s tool permissions, and ship proof-of-concept outputs rather than full exploits. The model’s exploit-discovery workflow (automated recon, vulnerability hypothesis generation, proof-of-concept creation) is a pattern you can replicate for narrower security domains like web application testing or cloud configuration auditing. Sign up for the GA waitlist now to understand the API surface before competitors do.

🎯 What actually breaks when you ship AI agents for real service businesses (a year in)

r/AI_Agents | Reddit

🔗 https://www.reddit.com/r/AI_Agents/comments/1tu9aqt/what_actually_breaks_when_you_ship_ai_agents_for/

A practitioner with a year of real production agent deployments in service businesses posted a candid retrospective to r/AI_Agents on June 1, 2026. The post covers what actually breaks when AI agents run 24/7 for real clients: stale context windows that drift from the original task, tool permission boundaries that get silently overstepped as agents self-modify, feedback loops where agents reinforce their own errors without human noticing, and billing surprises when agents enter unexpected retry loops. The thread attracted dozens of comments from other builders sharing similar failure patterns, making it a crowd-sourced catalog of production agent failure modes. Unlike benchmark-focused discussions, this thread focuses on the operational reality of keeping agents running reliably for paying customers over months, not hours.

📌 Why it matters: The failure patterns in this thread are the ones that cost real money and lose real clients. Context drift, permission creep, and silent error reinforcement are not problems that benchmarks measure, but they are the problems that determine whether an agent deployment survives its first quarter. The collective experience in the thread is a faster way to learn these lessons than discovering them in production. Every agent builder who serves business clients will encounter at least two of these failure modes within the first three months.

🤖 Agent angle: Read the full thread and audit your own agent deployments against each failure mode. Add guardrails for context window expiry (force a task re-scoping after N steps), review tool permissions weekly instead of at deploy time, and set a maximum retry budget per task. The most actionable insight from the thread is to add a human-in-the-loop approval gate when an agent proposes to modify its own tool permissions, because that is where most silent failures start.

Want this in your inbox every day?

Daily curated intelligence on AI agents + monetization