Claude Managed Agents: Dreaming, Outcomes, Orchestration
May 10, 2026
TL;DR
On May 6, 2026, at the Code with Claude developer conference in San Francisco, Anthropic shipped three new capabilities for Claude Managed Agents: dreaming (research preview), a scheduled background process that reviews past sessions and curates memory between runs; outcomes (public beta), a rubric-driven grader loop that improved task success by up to 10 percentage points over standard prompting in internal benchmarks; and multiagent orchestration (public beta), which lets a lead agent delegate work to up to 20 unique specialist subagents (with up to 25 concurrent threads) running in parallel on a shared filesystem. Webhooks for agent completion shipped at the same time. Early customer results: legal-AI company Harvey saw roughly 6× higher task completion rates with dreaming enabled, and Wisedocs is reviewing medical documents 50% faster using outcomes.1234
What You'll Learn
- What "dreaming" actually does between agent sessions and why it differs from agent memory
- How outcomes uses a separate grader to push file-generation quality up by +8.4% on .docx and +10.1% on .pptx
- How multiagent orchestration coordinates a lead agent and parallel specialists
- The real Harvey and Wisedocs results behind the announcement
- Where each feature stands today: research preview, public beta, beta header
- The vendor lock-in trade-off enterprises have to weigh
A Quick Recap of Where We Were
Anthropic launched Claude Managed Agents in public beta on April 8, 2026, packaging the agent runtime — sandboxing, state, tool execution, and error recovery — into a hosted service so developers could ship production agents without rebuilding the infrastructure layer themselves.5 On April 23, 2026, Anthropic added a memory feature in public beta, giving agents the ability to retain and apply learning across sessions through persistent memory stores.6
The May 6 update is the next step in that arc. Memory let an individual agent remember what it learned. Dreaming, outcomes, and multiagent orchestration target three distinct problems that show up once agents are actually deployed at scale: agents that don't get smarter over time, agents that produce work nobody can grade, and agents that buckle under tasks too big for a single context window.1
Dreaming: Memory That Curates Itself
Dreaming is a scheduled background process that reviews an agent's past sessions and existing memory stores, extracts patterns across them, and curates those memories so agents improve over time.12
The framing is deliberate. Memory captures what one agent learned during one task. Dreaming surfaces patterns that no single session could see on its own: recurring mistakes, workflows that multiple agents converge on independently, and preferences shared across a team of agents.1
A concrete example from the launch: at the Code with Claude demo, a dreaming agent reviewed every past simulation session overnight and wrote a detailed playbook — a set of heuristics distilled from the patterns across all those runs. When the team ran the next morning's simulation with that playbook in memory, results improved on scenarios that had previously been failing.2
How Control Works
Developers choose how much autonomy dreaming gets. It can update memory automatically, or you can review proposed changes before they land. That second mode matters for regulated environments where any change to what an agent "knows" needs a human signature.12
Harvey's Result: ~6× More Completed Tasks
The headline customer number from the announcement: legal-AI company Harvey saw task completion rates rise approximately 6× in internal tests after enabling dreaming. The model wasn't swapped — only the memory system changed. The improvement came from agents accumulating institutional knowledge between sessions, including filetype workarounds and tool-specific patterns that had previously been re-learned every time.27
Availability
Dreaming is in research preview and is not on by default. Access is gated through an application form on Anthropic's site.17
Outcomes: A Grader That Holds the Agent to a Rubric
The second feature, outcomes, addresses a different problem. Most agent loops optimize for "the model thinks it's done." Outcomes optimizes for "the work passes a defined bar."1
You write a rubric describing what success looks like for the task. The agent does its work. A separate Claude instance — the grader — evaluates the output against your rubric in its own context window, so it isn't influenced by the chain of reasoning the agent used to produce the output. If the work fails, the grader pinpoints what needs to change and the agent takes another pass. The loop continues until the rubric is satisfied.13
Why a Separate Context Window Matters
This is the load-bearing design choice. An LLM grading its own work in the same context tends to confirm its own reasoning. By giving the grader a clean context that sees only the rubric and the output, outcomes recreates the dynamic of a reviewer who hasn't sat through the meeting where you talked yourself into a bad idea.
The Numbers
In Anthropic's internal benchmarks:1
| Task type | Result with outcomes |
|---|---|
| Overall task success vs. standard prompting loop | Up to +10 percentage points, largest gains on the hardest problems |
.docx file generation | +8.4% task success |
.pptx file generation | +10.1% task success |
Wisedocs's Result: Reviews 50% Faster
Wisedocs, which automates medical record review for insurance and legal claims, built a document-quality-check agent on Managed Agents and uses outcomes to grade each review against their internal guidelines. They report reviews now run 50% faster while staying aligned with their team's standards.1
Availability
Outcomes is in public beta and available to all developers via the Claude Platform API behind the managed-agents-2026-04-01 beta header. No separate access request is required.14
Multiagent Orchestration: A Lead Agent and Up to 20 Specialists
The third feature, multiagent orchestration, is for tasks that simply don't fit in one agent's head — long investigations, multi-source research, code work that spans several subsystems.1
A lead agent decomposes the job and delegates each piece to a specialist subagent with its own model, prompt, and tools. The specialists work in parallel on a shared filesystem and contribute back into the lead agent's overall context. The launch announcement gives the example of a lead agent running an incident investigation while subagents fan out through deploy history, error logs, metrics, and support tickets.1
A coordinator can be configured with up to 20 unique subagent definitions and runs up to 25 concurrent threads in the current beta. The configuration declares the coordinator model and lists the subagent IDs the coordinator is allowed to delegate to; the coordinator decides at runtime which specialist gets which task and can spawn multiple copies of the same subagent.78
Observability
Every step is traceable in the Claude Console — which agent did what, in what order, and why. That matters because a parallel multiagent run that stalls inside a single specialist is a debugging nightmare without timeline-level traces.8
Availability
Multiagent orchestration is in public beta alongside outcomes, behind the same managed-agents-2026-04-01 beta header. No waitlist.14
Webhooks: Quietly the Most Important Plumbing
The fourth thing Anthropic shipped on May 6 — overshadowed by the dreaming headlines but arguably the most operationally useful — is webhooks for agent completion. You define an outcome, kick off the agent, and get an HTTP callback when it finishes.19
This is the difference between an agent platform you have to babysit and one you can wire into a real production system. Long-horizon agent runs that previously required either polling or a held-open SSE stream now fit cleanly into event-driven architectures. Webhooks are configured in the Claude Console.9
Availability
Webhooks are in public beta alongside outcomes and multiagent orchestration.19
How the Pieces Fit Together
| Capability | Status (May 2026) | Beta header / access | What it solves |
|---|---|---|---|
| Memory (April 23) | Public beta | managed-agents-2026-04-01 | Agents retain context across sessions |
| Dreaming | Research preview | Application form required | Memory curates itself between sessions |
| Outcomes | Public beta | managed-agents-2026-04-01 | Rubric-graded loop until work meets a bar |
| Multiagent orchestration | Public beta | managed-agents-2026-04-01 | Lead agent + up to 20 parallel specialists |
| Webhooks | Public beta | managed-agents-2026-04-01 | Event-driven completion notifications |
Pricing is unchanged from the April launch: standard Claude API token rates plus $0.08 per active session-hour, with idle sessions not accruing runtime charges.410
The Vendor Lock-In Trade-off
Coverage of the May 6 update has converged on one critique. VentureBeat's analysis is the sharpest version: Anthropic wants to own your agent's memory, evals, and orchestration — and the platform's hosted runtime puts that memory and orchestration on infrastructure the enterprise does not own, which can become a compliance issue for organizations that need to prove data residency. The May 6 webhooks shipment further extends that surface area into the event-handling layer.11
That's a real trade-off. For teams already in the Anthropic ecosystem, dreaming + outcomes + multiagent + webhooks is a coherent stack that meaningfully removes operational toil. For teams that need to keep memory or orchestration on their own infrastructure — or want the flexibility to swap models — it's an argument for an open framework like LangGraph or CrewAI, even at the cost of building more of the runtime themselves.
The May 6 announcement makes the bet sharper, not softer. Anthropic is consolidating capabilities that used to be cobbled together from separate vendors. Whether that consolidation is a feature or a hazard depends on which side of the buy-vs-build line your team sits on.
What This Means for Builders
If you are already building on Claude Managed Agents or Claude Code, three of the four updates are flag-flips behind the existing beta header.
The honest order of operations for most teams:
- Turn on outcomes first. A rubric loop is the cheapest way to lift quality on agents already in production, and the +8.4% / +10.1% file-generation numbers translate directly to fewer human review cycles.
- Add webhooks if you are still polling for completion. Long-running sessions and event-driven pipelines stop fighting each other.
- Multiagent orchestration is the right tool when one agent's context window is the actual bottleneck — not a default architecture for every task.
- Dreaming is worth applying for in research preview if you have agents that run frequently against the same domain and currently re-learn the same lessons. If the workload is bursty or one-off, the curation pass has nothing to chew on.
For teams comparing AI agent platforms, this update changes the surface area. Managed Agents is no longer just "hosted runtime"; it's now also the eval layer, the orchestration layer, and the memory-consolidation layer. That makes framework comparisons a different conversation than they were a month ago.
Bottom Line
The May 6 update tells you what Anthropic thinks the bottleneck is. It isn't the model — it's everything around the model. Dreaming attacks the "agents don't get smarter over time" problem. Outcomes attacks the "the work isn't actually good" problem. Multiagent orchestration attacks the "this task is bigger than one context window" problem. Webhooks make the whole thing fit into event-driven production systems. Together they turn Claude Managed Agents from a hosted runtime into something closer to an operating system for agentic work — at the cost of a deeper bet on a single vendor. That trade-off is now the central question for any team standing up agents in production.
Footnotes
-
Anthropic, "New in Claude Managed Agents: dreaming, outcomes, and multiagent orchestration," claude.com/blog/new-in-claude-managed-agents, May 6, 2026. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10 ↩11 ↩12 ↩13 ↩14 ↩15 ↩16 ↩17 ↩18
-
VentureBeat, "Anthropic introduces 'dreaming,' a system that lets AI agents learn from their own mistakes," venturebeat.com, May 6, 2026. ↩ ↩2 ↩3 ↩4 ↩5 ↩6
-
SD Times, "New in Claude Managed Agents: dreaming, outcomes, and multiagent orchestration," sdtimes.com, May 7, 2026. ↩ ↩2
-
Anthropic, "Claude Managed Agents overview," platform.claude.com/docs/en/managed-agents/overview, May 2026. ↩ ↩2 ↩3 ↩4 ↩5 ↩6
-
Anthropic, "Claude Managed Agents: get to production 10x faster," claude.com/blog/claude-managed-agents, April 8, 2026. ↩
-
EdTech Innovation Hub, "Anthropic adds persistent memory to Claude Managed Agents in public beta," edtechinnovationhub.com, April 23, 2026. ↩
-
Build Fast With AI, "Claude Managed Agents Dreaming Explained (2026)," buildfastwithai.com, May 2026. ↩ ↩2 ↩3 ↩4 ↩5
-
Anthropic, "Multiagent sessions," platform.claude.com/docs/en/managed-agents/multi-agent, May 2026. ↩ ↩2 ↩3
-
Hookdeck, "Anthropic shipped webhooks for Claude Managed Agents. Here's what they unlock," hookdeck.com/blog/anthropic-managed-agent-webhooks, May 2026. ↩ ↩2 ↩3
-
Anthropic, "Claude pricing," platform.claude.com/docs/en/about-claude/pricing, accessed May 2026. ↩ ↩2
-
VentureBeat, "Anthropic wants to own your agent's memory, evals, and orchestration — and that should make enterprises nervous," venturebeat.com, May 2026. ↩
-
Anthropic, "Introducing Claude Opus 4.7," anthropic.com/news/claude-opus-4-7, April 16, 2026; GitHub Changelog, "Claude Opus 4.7 is generally available," April 16, 2026. ↩