Does dreaming change the underlying model?

No. The Harvey ~6× completion-rate result came from the same model — only the memory system changed. Dreaming curates what's stored in agent memory between sessions; the agent still calls the same Claude model at runtime. 2 7

Why does the outcomes grader run in a separate context window?

To prevent the grader from being influenced by the agent's own reasoning. A reviewer that sees only the rubric and the final output can spot mistakes the producing agent has already rationalized. 1

How many subagents can multiagent orchestration spawn?

The current configuration supports a coordinator and up to 20 unique subagent definitions, with up to 25 concurrent threads at runtime. The coordinator can spawn multiple copies of the same subagent and decides at runtime which specialist receives which task. 7 8

Did pricing change with this update?

No. The same pricing announced on April 8 still applies: standard Claude API token rates plus $0.08 per active session-hour, with idle time not billed. 4 10

Which Claude models do these features work with?

Managed Agents currently supports Claude Opus 4.7 (generally available since April 16, 2026), Claude Sonnet 4.6, and Claude Haiku 4.5 in the harness. The launch demos for dreaming used Opus 4.7. 4 12

Claude Managed Agents: Dreaming, Outcomes, Orchestration

May 10, 2026

#anthropic #claude #ai-agents #managed-agents #agent-memory #multiagent-orchestration #agentic-ai

Claude Managed Agents: Dreaming, Outcomes, Orchestration

TL;DR

On May 6, 2026, at the Code with Claude developer conference in San Francisco, Anthropic shipped three new capabilities for Claude Managed Agents: dreaming (research preview), a scheduled background process that reviews past sessions and curates memory between runs; outcomes (public beta), a rubric-driven grader loop that improved task success by up to 10 percentage points over standard prompting in internal benchmarks; and multiagent orchestration (public beta), which lets a lead agent delegate work to up to 20 unique specialist subagents (with up to 25 concurrent threads) running in parallel on a shared filesystem. Webhooks for agent completion shipped at the same time. Early customer results: legal-AI company Harvey saw roughly 6× higher task completion rates with dreaming enabled, and Wisedocs is reviewing medical documents 50% faster using outcomes.¹²³⁴

What You'll Learn

What "dreaming" actually does between agent sessions and why it differs from agent memory
How outcomes uses a separate grader to push file-generation quality up by +8.4% on .docx and +10.1% on .pptx
How multiagent orchestration coordinates a lead agent and parallel specialists
The real Harvey and Wisedocs results behind the announcement
Where each feature stands today: research preview, public beta, beta header
The vendor lock-in trade-off enterprises have to weigh

A Quick Recap of Where We Were

Anthropic launched Claude Managed Agents in public beta on April 8, 2026, packaging the agent runtime — sandboxing, state, tool execution, and error recovery — into a hosted service so developers could ship production agents without rebuilding the infrastructure layer themselves.⁵ On April 23, 2026, Anthropic added a memory feature in public beta, giving agents the ability to retain and apply learning across sessions through persistent memory stores.⁶

The May 6 update is the next step in that arc. Memory let an individual agent remember what it learned. Dreaming, outcomes, and multiagent orchestration target three distinct problems that show up once agents are actually deployed at scale: agents that don't get smarter over time, agents that produce work nobody can grade, and agents that buckle under tasks too big for a single context window.¹

Dreaming: Memory That Curates Itself

Dreaming is a scheduled background process that reviews an agent's past sessions and existing memory stores, extracts patterns across them, and curates those memories so agents improve over time.¹²

The framing is deliberate. Memory captures what one agent learned during one task. Dreaming surfaces patterns that no single session could see on its own: recurring mistakes, workflows that multiple agents converge on independently, and preferences shared across a team of agents.¹

A concrete example from the launch: at the Code with Claude demo, a dreaming agent reviewed every past simulation session overnight and wrote a detailed playbook — a set of heuristics distilled from the patterns across all those runs. When the team ran the next morning's simulation with that playbook in memory, results improved on scenarios that had previously been failing.²

How Control Works

Developers choose how much autonomy dreaming gets. It can update memory automatically, or you can review proposed changes before they land. That second mode matters for regulated environments where any change to what an agent "knows" needs a human signature.¹²

Harvey's Result: ~6× More Completed Tasks

The headline customer number from the announcement: legal-AI company Harvey saw task completion rates rise approximately 6× in internal tests after enabling dreaming. The model wasn't swapped — only the memory system changed. The improvement came from agents accumulating institutional knowledge between sessions, including filetype workarounds and tool-specific patterns that had previously been re-learned every time.²⁷

Availability

Dreaming is in research preview and is not on by default. Access is gated through an application form on Anthropic's site.¹⁷

Outcomes: A Grader That Holds the Agent to a Rubric

The second feature, outcomes, addresses a different problem. Most agent loops optimize for "the model thinks it's done." Outcomes optimizes for "the work passes a defined bar."¹

You write a rubric describing what success looks like for the task. The agent does its work. A separate Claude instance — the grader — evaluates the output against your rubric in its own context window, so it isn't influenced by the chain of reasoning the agent used to produce the output. If the work fails, the grader pinpoints what needs to change and the agent takes another pass. The loop continues until the rubric is satisfied.¹³

Why a Separate Context Window Matters

This is the load-bearing design choice. An LLM grading its own work in the same context tends to confirm its own reasoning. By giving the grader a clean context that sees only the rubric and the output, outcomes recreates the dynamic of a reviewer who hasn't sat through the meeting where you talked yourself into a bad idea.

The Numbers

In Anthropic's internal benchmarks:¹

Task type	Result with outcomes
Overall task success vs. standard prompting loop	Up to +10 percentage points, largest gains on the hardest problems
`.docx` file generation	+8.4% task success
`.pptx` file generation	+10.1% task success

Wisedocs's Result: Reviews 50% Faster

Wisedocs, which automates medical record review for insurance and legal claims, built a document-quality-check agent on Managed Agents and uses outcomes to grade each review against their internal guidelines. They report reviews now run 50% faster while staying aligned with their team's standards.¹

Availability

Outcomes is in public beta and available to all developers via the Claude Platform API behind the managed-agents-2026-04-01 beta header. No separate access request is required.¹⁴

Multiagent Orchestration: A Lead Agent and Up to 20 Specialists

The third feature, multiagent orchestration, is for tasks that simply don't fit in one agent's head — long investigations, multi-source research, code work that spans several subsystems.¹

A lead agent decomposes the job and delegates each piece to a specialist subagent with its own model, prompt, and tools. The specialists work in parallel on a shared filesystem and contribute back into the lead agent's overall context. The launch announcement gives the example of a lead agent running an incident investigation while subagents fan out through deploy history, error logs, metrics, and support tickets.¹

A coordinator can be configured with up to 20 unique subagent definitions and runs up to 25 concurrent threads in the current beta. The configuration declares the coordinator model and lists the subagent IDs the coordinator is allowed to delegate to; the coordinator decides at runtime which specialist gets which task and can spawn multiple copies of the same subagent.⁷⁸

Observability

Every step is traceable in the Claude Console — which agent did what, in what order, and why. That matters because a parallel multiagent run that stalls inside a single specialist is a debugging nightmare without timeline-level traces.⁸

Availability

Multiagent orchestration is in public beta alongside outcomes, behind the same managed-agents-2026-04-01 beta header. No waitlist.¹⁴

Webhooks: Quietly the Most Important Plumbing

The fourth thing Anthropic shipped on May 6 — overshadowed by the dreaming headlines but arguably the most operationally useful — is webhooks for agent completion. You define an outcome, kick off the agent, and get an HTTP callback when it finishes.¹⁹

This is the difference between an agent platform you have to babysit and one you can wire into a real production system. Long-horizon agent runs that previously required either polling or a held-open SSE stream now fit cleanly into event-driven architectures. Webhooks are configured in the Claude Console.⁹

Availability

Webhooks are in public beta alongside outcomes and multiagent orchestration.¹⁹

How the Pieces Fit Together

Capability	Status (May 2026)	Beta header / access	What it solves
Memory (April 23)	Public beta	`managed-agents-2026-04-01`	Agents retain context across sessions
Dreaming	Research preview	Application form required	Memory curates itself between sessions
Outcomes	Public beta	`managed-agents-2026-04-01`	Rubric-graded loop until work meets a bar
Multiagent orchestration	Public beta	`managed-agents-2026-04-01`	Lead agent + up to 20 parallel specialists
Webhooks	Public beta	`managed-agents-2026-04-01`	Event-driven completion notifications

Pricing is unchanged from the April launch: standard Claude API token rates plus $0.08 per active session-hour, with idle sessions not accruing runtime charges.⁴¹⁰

The Vendor Lock-In Trade-off

Coverage of the May 6 update has converged on one critique. VentureBeat's analysis is the sharpest version: Anthropic wants to own your agent's memory, evals, and orchestration — and the platform's hosted runtime puts that memory and orchestration on infrastructure the enterprise does not own, which can become a compliance issue for organizations that need to prove data residency. The May 6 webhooks shipment further extends that surface area into the event-handling layer.¹¹

That's a real trade-off. For teams already in the Anthropic ecosystem, dreaming + outcomes + multiagent + webhooks is a coherent stack that meaningfully removes operational toil. For teams that need to keep memory or orchestration on their own infrastructure — or want the flexibility to swap models — it's an argument for an open framework like LangGraph or CrewAI, even at the cost of building more of the runtime themselves.

The May 6 announcement makes the bet sharper, not softer. Anthropic is consolidating capabilities that used to be cobbled together from separate vendors. Whether that consolidation is a feature or a hazard depends on which side of the buy-vs-build line your team sits on.

What This Means for Builders

If you are already building on Claude Managed Agents or Claude Code, three of the four updates are flag-flips behind the existing beta header.

The honest order of operations for most teams:

Turn on outcomes first. A rubric loop is the cheapest way to lift quality on agents already in production, and the +8.4% / +10.1% file-generation numbers translate directly to fewer human review cycles.
Add webhooks if you are still polling for completion. Long-running sessions and event-driven pipelines stop fighting each other.
Multiagent orchestration is the right tool when one agent's context window is the actual bottleneck — not a default architecture for every task.
Dreaming is worth applying for in research preview if you have agents that run frequently against the same domain and currently re-learn the same lessons. If the workload is bursty or one-off, the curation pass has nothing to chew on.

For teams comparing AI agent platforms, this update changes the surface area. Managed Agents is no longer just "hosted runtime"; it's now also the eval layer, the orchestration layer, and the memory-consolidation layer. That makes framework comparisons a different conversation than they were a month ago.

Bottom Line

The May 6 update tells you what Anthropic thinks the bottleneck is. It isn't the model — it's everything around the model. Dreaming attacks the "agents don't get smarter over time" problem. Outcomes attacks the "the work isn't actually good" problem. Multiagent orchestration attacks the "this task is bigger than one context window" problem. Webhooks make the whole thing fit into event-driven production systems. Together they turn Claude Managed Agents from a hosted runtime into something closer to an operating system for agentic work — at the cost of a deeper bet on a single vendor. That trade-off is now the central question for any team standing up agents in production.

Anthropic, "New in Claude Managed Agents: dreaming, outcomes, and multiagent orchestration," claude.com/blog/new-in-claude-managed-agents, May 6, 2026. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷ ↩¹⁸
VentureBeat, "Anthropic introduces 'dreaming,' a system that lets AI agents learn from their own mistakes," venturebeat.com, May 6, 2026. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
SD Times, "New in Claude Managed Agents: dreaming, outcomes, and multiagent orchestration," sdtimes.com, May 7, 2026. ↩ ↩²
Anthropic, "Claude Managed Agents overview," platform.claude.com/docs/en/managed-agents/overview, May 2026. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
Anthropic, "Claude Managed Agents: get to production 10x faster," claude.com/blog/claude-managed-agents, April 8, 2026. ↩
EdTech Innovation Hub, "Anthropic adds persistent memory to Claude Managed Agents in public beta," edtechinnovationhub.com, April 23, 2026. ↩
Build Fast With AI, "Claude Managed Agents Dreaming Explained (2026)," buildfastwithai.com, May 2026. ↩ ↩² ↩³ ↩⁴ ↩⁵
Anthropic, "Multiagent sessions," platform.claude.com/docs/en/managed-agents/multi-agent, May 2026. ↩ ↩² ↩³
Hookdeck, "Anthropic shipped webhooks for Claude Managed Agents. Here's what they unlock," hookdeck.com/blog/anthropic-managed-agent-webhooks, May 2026. ↩ ↩² ↩³
Anthropic, "Claude pricing," platform.claude.com/docs/en/about-claude/pricing, accessed May 2026. ↩ ↩²
VentureBeat, "Anthropic wants to own your agent's memory, evals, and orchestration — and that should make enterprises nervous," venturebeat.com, May 2026. ↩
Anthropic, "Introducing Claude Opus 4.7," anthropic.com/news/claude-opus-4-7, April 16, 2026; GitHub Changelog, "Claude Opus 4.7 is generally available," April 16, 2026. ↩

Frequently Asked Questions

No. Dreaming is in research preview as of May 6, 2026, and requires an access request through Anthropic's form. Outcomes, multiagent orchestration, and webhooks are in public beta and available to any developer using the managed-agents-2026-04-01 beta header. 1

Claude Managed Agents: Dreaming, Outcomes, Orchestration

TL;DR

What You'll Learn

A Quick Recap of Where We Were

Dreaming: Memory That Curates Itself

How Control Works

Harvey's Result: ~6× More Completed Tasks

Availability

Outcomes: A Grader That Holds the Agent to a Rubric

Why a Separate Context Window Matters

The Numbers

Wisedocs's Result: Reviews 50% Faster

Availability

Multiagent Orchestration: A Lead Agent and Up to 20 Specialists

Observability

Availability

Webhooks: Quietly the Most Important Plumbing

Availability

How the Pieces Fit Together

The Vendor Lock-In Trade-off

What This Means for Builders

Bottom Line

Frequently Asked Questions

Related Posts

Claude Managed Agents: Build Production AI Agents in Days

Adobe + Claude: 50+ Creative Tools From One Prompt

Claude Opus 4.7: Benchmarks, Features & Pricing

Claude Opus 4.8: Benchmarks, Dynamic Workflows, Pricing

Stay on the Nerd Track