R&D Scan Digest

Date: 2026-05-11 Analyst: R&D Analyst (Riley) Coverage window: 2026-03-30 → 2026-05-11 (since prior scan)

Context

Last scan was 2026-03-29 — Computer Use, Claude Dispatch, and MCP Channels were promoted to Evaluate. Since then Anthropic ran the Code w/ Claude developer event (May 6) and shipped a major April-16 wave with Opus 4.7 GA. The factory itself is already running on Opus 4.7 (1M context), so this scan focuses on the surrounding ecosystem shifts.

Discoveries

Evaluate

#	Discovery	Type	Why It Matters
1	Claude Managed Agents (public beta)	Platform	Cloud-hosted agents at scale: sandboxing, long-running sessions, scoped permissions, tracing, webhooks, multi-agent orchestration. $0.08/runtime-hour + token costs. Could become the delivery substrate for digital talents we ship to SMBs — instead of installing Claude Code on each client's machine, we host the talent. Directly affects our deployment model. Source
2	MCP Tool Search (lazy loading)	Feature	Auto-activates when MCP tool defs exceed 10% of context. Cuts ~95% of MCP context cost (12K→600 tokens on 3 servers; 77K→8.7K on 50 tools). Accuracy on Opus 4 jumped 49→74% on MCP evals. Factory uses Atlassian, Telegram, Figma, Context7, Gmail MCPs — immediate context budget win. Source
3	Dreaming (Managed Agents research preview)	Feature	Scheduled background process that reviews past agent sessions, extracts recurring patterns, curates memory stores. Harvey saw 6× task-completion improvement. Direct parallel to our `/method-improve` and `/ci` loops — Anthropic now provides this as infrastructure. Worth comparing to our `auto-memory` system. Source
4	Task Budgets (Opus 4.7)	Feature	Model receives a token-target budget for the full agentic loop and prioritizes work as the countdown runs. Relevant for `/role-factory`, `/auto-research`, and any long-horizon production-line stage where we currently lose runs to context exhaustion. Source
5	Plugin .zip + --plugin-url loading	Feature	`--plugin-dir` accepts `.zip`, `--plugin-url` fetches a plugin archive for the session. This is the distribution mechanic for shipping digital talents — package the talent as a zip, host it, client runs `claude --plugin-url ...`. Pairs with watch item #7 (official plugin marketplace). Source
6	Auto-mode hard deny rules	Feature	New settings layer that blocks specific actions unconditionally in auto mode. Critical guardrail for the "autonomous execution" pattern (memory: `autonomous-execution`). Also opens safer client-side deployment. Source
7	Claude Cowork GA + enterprise features	Platform	RBAC, group spend limits, expanded usage analytics, OpenTelemetry, Zoom MCP, per-tool connector controls. Relevant when factory or a delivered talent serves multiple seats at a client. Maps directly to STM's enterprise requirements. Source
8	Multi-agent orchestration in Managed Agents (public beta)	Feature	Promoted from Watch (Agent Teams, 2026-04-28). Re-eval triggered. Same primitive but now positioned as a hosted product, not just an experimental flag. Even though user prefers subagents internally, this is the supply chain for delivering coordinated digital talents to clients. Source

Watch

#	Discovery	Type	Re-eval Date	Note
1	High-resolution vision (Opus 4.7)	Feature	2026-06-10	2576px / 3.75MP images supported (up from 1568px / 1.15MP). Useful for DXF/floor-plan extraction (CON-0004) and screenshot-driven UX work. Test next time a vision task fails on resolution. Source
2	Filesystem memory improvements (Opus 4.7)	Feature	2026-06-10	Model is "better at writing and using file-system-based memory." Our `auto-memory` system should benefit passively; revisit if we want to evolve memory structure.
3	Hooks see effort level (`$CLAUDE_EFFORT`)	Feature	2026-06-10	Hooks can branch on effort level. Minor, but enables effort-aware guardrails (e.g., skip expensive checks in `effort low`).
4	`worktree.baseRef` config	Feature	2026-06-10	Choose remote-default vs. local HEAD for new worktrees. Quality-of-life for our production-line worktree pattern.
5	Plugin marketplace explosion (4,200+ skills, 770+ MCPs)	Market	2026-06-10	Several third-party catalogs now exist (claudemarketplaces.com, SkillsMP, tonsofskills.com). Volume signal — worth scanning for high-quality skills before building our own.

Skip

#	Discovery	Type	Reason Skipped
1	`/buddy` (April 1 release)	Feature	April Fools cosmetic.
2	Higgsfield MCP, Meta Ads MCP, Google Ads MCP, Klaviyo MCP, Shopify AI Toolkit	MCP	Vertical marketing/commerce — outside factory scope. Re-evaluate only if a client needs them.
3	Memory leak fixes, CJK history fix, iTerm2 `/copy`, MCP startup auto-retry, session-ID header	Bugfix	Stability/QoL. No action.
4	VCS exclusions for Jujutsu/Sapling	Bugfix	We use git only.
5	Claude Opus 4.7 model alias rename ("opus" → "default")	API	Affects ACP downstream apps only; no factory impact.

Watch List Re-evaluation (auto-promotion)

Today is 2026-05-11. All items 1–8 and 10 on the existing watch list had re-eval dates of 2026-04-25 or 2026-04-28 — past due. Resolution:

Old Watch #	Topic	Decision	Rationale
1	Voice Mode	Demote → drop	Convenience-only, no new evidence. Stop tracking.
2	Google Colab MCP	Hold (re-add)	No new evidence. Re-watch with date 2026-08-11 (only relevant once we have an ML production line).
3	Worktree Sparse Checkout	Drop	Repo growth has not become a constraint.
4	MCP Enterprise Readiness	Promote → Evaluate #7	Now embodied in Cowork GA (RBAC, OpenTelemetry). Folded into Evaluate row #7.
5	Context Engineering	Drop	Factory already practices this implicitly; no specific artifact to evaluate.
6	Agent Teams (experimental)	Promote → Evaluate #8	Now part of Managed Agents multi-agent orchestration. Folded into Evaluate row #8.
7	Official Plugin Marketplace	Promote → Evaluate #5	Combined with plugin `.zip` / `--plugin-url` distribution mechanic.
8	Effort Frontmatter for Skills	Drop	Available and trivial to use; if needed, set per-skill on demand — no evaluation required.
10	Hyper Agents (Meta AI)	Hold (re-add)	No production tooling yet. Re-watch with date 2026-08-11.

Item 9 (Computer Use, re-eval 2026-05-28) remains on hold — date not yet passed.

Radar Cross-Reference

Radar Item	Current Ring	Suggested Change	Rationale
Claude Code (Opus 4.6)	Adopt	Update label to Opus 4.7	We are already running 4.7 (1M ctx) per system prompt. Radar entry is stale.
Paperclip	Assess	No change	No new evidence.

Potential new radar entries (pending evaluation outcomes):

Claude Managed Agents → Assess (Platforms) — likely delivery substrate for hosted digital talents.
MCP Tool Search → Trial (Tools) — low-risk, immediate context budget win; can be turned on now.
Dreaming → Assess (Techniques) — compare against our existing memory + /method-improve loop.
Plugin .zip / --plugin-url distribution → Assess (Techniques) — pairs with delivery pattern.

Recommended Next Steps

In priority order — each is a /rd-evaluate candidate:

/rd-evaluate "Claude Managed Agents (public beta)" — Highest priority. Could re-shape the delivery model: instead of installing Claude Code per-client, we host digital talents. Cost ($0.08/runtime-hour + tokens), security model, multi-tenancy, and how it interacts with our current OneDrive-handover pattern all need scoring.
/rd-evaluate "MCP Tool Search lazy loading" — Quick win. Likely Trial-by-default decision; mostly about confirming it auto-activates correctly with our MCP stack (Atlassian, Telegram, Figma, Context7, Gmail) and measuring real factory context savings.
/rd-evaluate "Dreaming for factory memory + continuous improvement" — Strategic. We already have /ci, /method-improve, and an auto-memory system. Question: does Dreaming subsume any of these, or stack on top? Worth a head-to-head.
/rd-evaluate "Plugin .zip / --plugin-url distribution model" — Folds in watch item #7. Maps directly to the digital-talent packaging decision (/deploy-package) and could replace bespoke deployment scripts.
/rd-evaluate "Task budgets in Opus 4.7" — Tactical. Test on /role-factory and /toolkit:auto-research to see whether explicit budgets change long-horizon behavior.
/rd-evaluate "Cowork GA enterprise features for STM/multi-seat clients" — Lower urgency but unlocks the enterprise sales motion. Folds in old Watch #4 (MCP Enterprise Readiness).

Items 6 and 8 in the Evaluate table can be bundled into evaluation #1 (Managed Agents) — they are sub-capabilities of the same platform shift.