R&D Scan Digest
Source: departments/executive/rd-analyst-riley/scans/2026-05-11-scan.md
R&D Scan Digest
Date: 2026-05-11 Analyst: R&D Analyst (Riley) Coverage window: 2026-03-30 → 2026-05-11 (since prior scan)
Context
Last scan was 2026-03-29 — Computer Use, Claude Dispatch, and MCP Channels were promoted to Evaluate. Since then Anthropic ran the Code w/ Claude developer event (May 6) and shipped a major April-16 wave with Opus 4.7 GA. The factory itself is already running on Opus 4.7 (1M context), so this scan focuses on the surrounding ecosystem shifts.
Discoveries
Evaluate
| # | Discovery | Type | Why It Matters |
|---|---|---|---|
| 1 | Claude Managed Agents (public beta) | Platform | Cloud-hosted agents at scale: sandboxing, long-running sessions, scoped permissions, tracing, webhooks, multi-agent orchestration. $0.08/runtime-hour + token costs. Could become the delivery substrate for digital talents we ship to SMBs — instead of installing Claude Code on each client's machine, we host the talent. Directly affects our deployment model. Source |
| 2 | MCP Tool Search (lazy loading) | Feature | Auto-activates when MCP tool defs exceed 10% of context. Cuts ~95% of MCP context cost (12K→600 tokens on 3 servers; 77K→8.7K on 50 tools). Accuracy on Opus 4 jumped 49→74% on MCP evals. Factory uses Atlassian, Telegram, Figma, Context7, Gmail MCPs — immediate context budget win. Source |
| 3 | Dreaming (Managed Agents research preview) | Feature | Scheduled background process that reviews past agent sessions, extracts recurring patterns, curates memory stores. Harvey saw 6× task-completion improvement. Direct parallel to our /method-improve and /ci loops — Anthropic now provides this as infrastructure. Worth comparing to our auto-memory system. Source |
| 4 | Task Budgets (Opus 4.7) | Feature | Model receives a token-target budget for the full agentic loop and prioritizes work as the countdown runs. Relevant for /role-factory, /auto-research, and any long-horizon production-line stage where we currently lose runs to context exhaustion. Source |
| 5 | Plugin .zip + --plugin-url loading | Feature | --plugin-dir accepts .zip, --plugin-url fetches a plugin archive for the session. This is the distribution mechanic for shipping digital talents — package the talent as a zip, host it, client runs claude --plugin-url .... Pairs with watch item #7 (official plugin marketplace). Source |
| 6 | Auto-mode hard deny rules | Feature | New settings layer that blocks specific actions unconditionally in auto mode. Critical guardrail for the "autonomous execution" pattern (memory: autonomous-execution). Also opens safer client-side deployment. Source |
| 7 | Claude Cowork GA + enterprise features | Platform | RBAC, group spend limits, expanded usage analytics, OpenTelemetry, Zoom MCP, per-tool connector controls. Relevant when factory or a delivered talent serves multiple seats at a client. Maps directly to STM's enterprise requirements. Source |
| 8 | Multi-agent orchestration in Managed Agents (public beta) | Feature | Promoted from Watch (Agent Teams, 2026-04-28). Re-eval triggered. Same primitive but now positioned as a hosted product, not just an experimental flag. Even though user prefers subagents internally, this is the supply chain for delivering coordinated digital talents to clients. Source |
Watch
| # | Discovery | Type | Re-eval Date | Note |
|---|---|---|---|---|
| 1 | High-resolution vision (Opus 4.7) | Feature | 2026-06-10 | 2576px / 3.75MP images supported (up from 1568px / 1.15MP). Useful for DXF/floor-plan extraction (CON-0004) and screenshot-driven UX work. Test next time a vision task fails on resolution. Source |
| 2 | Filesystem memory improvements (Opus 4.7) | Feature | 2026-06-10 | Model is "better at writing and using file-system-based memory." Our auto-memory system should benefit passively; revisit if we want to evolve memory structure. |
| 3 | Hooks see effort level ($CLAUDE_EFFORT) |
Feature | 2026-06-10 | Hooks can branch on effort level. Minor, but enables effort-aware guardrails (e.g., skip expensive checks in effort low). |
| 4 | worktree.baseRef config |
Feature | 2026-06-10 | Choose remote-default vs. local HEAD for new worktrees. Quality-of-life for our production-line worktree pattern. |
| 5 | Plugin marketplace explosion (4,200+ skills, 770+ MCPs) | Market | 2026-06-10 | Several third-party catalogs now exist (claudemarketplaces.com, SkillsMP, tonsofskills.com). Volume signal — worth scanning for high-quality skills before building our own. |
Skip
| # | Discovery | Type | Reason Skipped |
|---|---|---|---|
| 1 | /buddy (April 1 release) |
Feature | April Fools cosmetic. |
| 2 | Higgsfield MCP, Meta Ads MCP, Google Ads MCP, Klaviyo MCP, Shopify AI Toolkit | MCP | Vertical marketing/commerce — outside factory scope. Re-evaluate only if a client needs them. |
| 3 | Memory leak fixes, CJK history fix, iTerm2 /copy, MCP startup auto-retry, session-ID header |
Bugfix | Stability/QoL. No action. |
| 4 | VCS exclusions for Jujutsu/Sapling | Bugfix | We use git only. |
| 5 | Claude Opus 4.7 model alias rename ("opus" → "default") | API | Affects ACP downstream apps only; no factory impact. |
Watch List Re-evaluation (auto-promotion)
Today is 2026-05-11. All items 1–8 and 10 on the existing watch list had re-eval dates of 2026-04-25 or 2026-04-28 — past due. Resolution:
| Old Watch # | Topic | Decision | Rationale |
|---|---|---|---|
| 1 | Voice Mode | Demote → drop | Convenience-only, no new evidence. Stop tracking. |
| 2 | Google Colab MCP | Hold (re-add) | No new evidence. Re-watch with date 2026-08-11 (only relevant once we have an ML production line). |
| 3 | Worktree Sparse Checkout | Drop | Repo growth has not become a constraint. |
| 4 | MCP Enterprise Readiness | Promote → Evaluate #7 | Now embodied in Cowork GA (RBAC, OpenTelemetry). Folded into Evaluate row #7. |
| 5 | Context Engineering | Drop | Factory already practices this implicitly; no specific artifact to evaluate. |
| 6 | Agent Teams (experimental) | Promote → Evaluate #8 | Now part of Managed Agents multi-agent orchestration. Folded into Evaluate row #8. |
| 7 | Official Plugin Marketplace | Promote → Evaluate #5 | Combined with plugin .zip / --plugin-url distribution mechanic. |
| 8 | Effort Frontmatter for Skills | Drop | Available and trivial to use; if needed, set per-skill on demand — no evaluation required. |
| 10 | Hyper Agents (Meta AI) | Hold (re-add) | No production tooling yet. Re-watch with date 2026-08-11. |
Item 9 (Computer Use, re-eval 2026-05-28) remains on hold — date not yet passed.
Radar Cross-Reference
| Radar Item | Current Ring | Suggested Change | Rationale |
|---|---|---|---|
| Claude Code (Opus 4.6) | Adopt | Update label to Opus 4.7 | We are already running 4.7 (1M ctx) per system prompt. Radar entry is stale. |
| Paperclip | Assess | No change | No new evidence. |
Potential new radar entries (pending evaluation outcomes):
- Claude Managed Agents → Assess (Platforms) — likely delivery substrate for hosted digital talents.
- MCP Tool Search → Trial (Tools) — low-risk, immediate context budget win; can be turned on now.
- Dreaming → Assess (Techniques) — compare against our existing memory +
/method-improveloop. - Plugin
.zip/--plugin-urldistribution → Assess (Techniques) — pairs with delivery pattern.
Recommended Next Steps
In priority order — each is a /rd-evaluate candidate:
/rd-evaluate "Claude Managed Agents (public beta)"— Highest priority. Could re-shape the delivery model: instead of installing Claude Code per-client, we host digital talents. Cost ($0.08/runtime-hour + tokens), security model, multi-tenancy, and how it interacts with our current OneDrive-handover pattern all need scoring./rd-evaluate "MCP Tool Search lazy loading"— Quick win. Likely Trial-by-default decision; mostly about confirming it auto-activates correctly with our MCP stack (Atlassian, Telegram, Figma, Context7, Gmail) and measuring real factory context savings./rd-evaluate "Dreaming for factory memory + continuous improvement"— Strategic. We already have/ci,/method-improve, and an auto-memory system. Question: does Dreaming subsume any of these, or stack on top? Worth a head-to-head./rd-evaluate "Plugin .zip / --plugin-url distribution model"— Folds in watch item #7. Maps directly to the digital-talent packaging decision (/deploy-package) and could replace bespoke deployment scripts./rd-evaluate "Task budgets in Opus 4.7"— Tactical. Test on/role-factoryand/toolkit:auto-researchto see whether explicit budgets change long-horizon behavior./rd-evaluate "Cowork GA enterprise features for STM/multi-seat clients"— Lower urgency but unlocks the enterprise sales motion. Folds in old Watch #4 (MCP Enterprise Readiness).
Items 6 and 8 in the Evaluate table can be bundled into evaluation #1 (Managed Agents) — they are sub-capabilities of the same platform shift.