Opus 4.7: Task Budgets and the Memory Tool

What actually changed in Claude Opus 4.7 — the task budget system, the persistent memory tool, and how they combine to make long-horizon agents finally reliable.

Claude Opus 4.7 shipped on April 16, 2026 — yesterday, as I’m writing this. The headline benchmark numbers are predictable and mostly don’t matter for day-to-day work. The two features worth understanding are the ones the benchmarks don’t capture: task budgets and the memory tool. Together they fix the single most persistent problem with long-running agents — the “it either gives up too early or wanders forever” failure mode that made 4.5 and 4.6 unreliable beyond about 20 tool calls.

This primer covers what those features actually do, why they matter now, and how to turn them on in real code.

Why this matters right now

The last eighteen months of frontier model work have converged on a single question: can an agent execute a multi-hour task without a human in the loop? The answer up through 4.6 was “sometimes, if you prompt carefully.” The answer in 4.7 is “yes, if you configure it correctly.”

The difference is that 4.7 stopped treating the agent loop as an opaque emergent behavior and started treating it as a system with explicit controls: a budget for how much work to do, and a place to put notes that survive between turns. That sounds pedestrian. It isn’t — it’s the shift from “agents as experiments” to “agents as things you can actually ship.”

There’s also a practical window: for about two weeks after a major model release, most production deployments are still running on the previous model. Knowing 4.7’s controls now means your agent stack can be meaningfully better than your competitors’ for a short period before everyone catches up.

What’s new, in one paragraph

Opus 4.7 introduces three tightly-related capabilities: task budgets that let you cap how many tools, tokens, or wall-clock seconds an agent can consume before it must return; a memory tool that gives the agent a structured, persistent file-like store that survives across turns and sessions; and adaptive thinking that dynamically adjusts extended-thinking depth based on task difficulty instead of requiring you to set it manually. All three work in both the API and Claude Code. All three are opt-in — you’ll pay list price for nothing if you don’t configure them — but together they change the ceiling of what a single-model agent can do.

Everything below is framed around these three.

Task budgets

The problem task budgets solve

Pre-4.7, agent loops had two pathological modes:

Early termination. The model decides it’s “done” after three tool calls when the actual task needs thirty. You get half a refactor.
Runaway. The model keeps calling tools for 200 steps on a task that should have been 20. You get a six-figure API bill or a rate-limit wall.

Both failure modes stem from the same root cause: the model has no model of its own resource consumption. It doesn’t know it’s on turn 47 of a task it should finish by turn 60. It can’t budget.

Task budgets expose that accounting directly to the model.

How they work

At the start of an agent turn you declare what the agent has to spend:

from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=8192,
    task_budget={
        "max_tool_calls": 40,
        "max_thinking_tokens": 20000,
        "max_wall_clock_seconds": 600,
    },
    tools=[...],
    messages=[...],
)

Three knobs:

max_tool_calls — the hard ceiling on tool invocations per task. The model sees its current usage as part of each turn and paces itself against the ceiling.
max_thinking_tokens — caps extended-thinking budget across the task. Different from per-turn thinking tokens; this is the whole-task allowance.
max_wall_clock_seconds — a real-time budget. Useful when tools call external services with variable latency.

When any budget is exhausted, the model returns a structured summary of what it accomplished rather than silently failing. That structured return is the second half of the value — you get a checkpoint you can resume from instead of a dead agent.

Setting budgets that don’t sabotage you

The default instinct is to set budgets low to control cost. Don’t. A too-tight budget produces the same “gave up halfway” outcome you had before, just explicitly instead of emergently.

Practical rule of thumb for a coding agent task:

Task shape	`max_tool_calls`	`max_thinking_tokens`	Wall clock
”Fix this failing test”	15	8,000	180s
”Refactor this module”	40	20,000	600s
”Build a feature end-to-end”	120	50,000	1,800s
”Audit this codebase for X”	80	30,000	900s

Err high. The budget is a ceiling, not a target — the model pulls back on its own when it’s done. The ceiling only matters when something’s going wrong, and when something’s going wrong you usually want a slightly higher ceiling than you expect, not lower.

Reading the budget in responses

The response object now includes a budget_status block:

{
  "budget_status": {
    "tool_calls_used": 37,
    "thinking_tokens_used": 18420,
    "wall_clock_seconds_used": 542,
    "exhausted": false,
    "limit_hit": null
  }
}

When exhausted is true, limit_hit will be one of "tool_calls", "thinking_tokens", or "wall_clock". That tells you which dial to adjust if you need to rerun.

The memory tool

The problem memory solves

Context windows are large. Useful context is a subset of the window. And persistent state — things the agent learned two hours or two days ago that are relevant now — has no natural home in a pure-context-window architecture.

The workarounds pre-4.7 were all bad:

Stuff everything into the system prompt. Works until the system prompt is 40k tokens and the model stops weighting it correctly.
Build your own RAG over prior turns. Heavyweight, easy to get wrong, and doesn’t survive well across session boundaries.
Append conversation history indefinitely. Cost scales linearly, and with autocompaction you eventually lose the middle anyway.

The memory tool is Anthropic’s answer: a built-in, model-native file-like store that the agent chooses when to write to and when to read from.

How it works

Memory is exposed to the model as a structured tool with three operations:

tools = [
    {
        "type": "memory_20260401",  # the built-in memory tool type
        "name": "memory",
        "config": {
            "storage": "client_managed",  # or "anthropic_managed"
            "namespace": "my-agent-project",
        },
    },
    # ... your other tools
]

The model can now issue memory_write, memory_read, and memory_list tool calls on its own, without your prompting. Each memory entry has a path (think filename), a value (text), and optional tags.

Two storage modes:

anthropic_managed — Anthropic stores the memory server-side, scoped to the namespace. Survives across sessions automatically. Easiest.
client_managed — your backend receives read/write requests and persists them however you like (Postgres, Redis, S3). More work, but you own the data.

For most product teams anthropic_managed is the right starting point. Switch to client_managed when you have a compliance requirement or want to query memory entries from non-agent code.

When the model actually uses memory well

The memory tool is not a silver bullet. It works well when:

The task involves stable facts about the user, project, or environment that come up repeatedly (“user prefers TypeScript, not JavaScript” — write once, read on every relevant turn).
The agent is executing a multi-session task (“we’re migrating this repo to Tailwind v4 — checkpoint progress here”).
You need learned preferences or corrections (“user said don’t use emoji in commit messages” — store it, apply it next time).

It works poorly when:

You use it as a general-purpose database. The model is not great at structured queries over many thousands of entries. Keep your memory footprint under ~200 active entries.
You expect it to reason across many entries at once. It’s a retrieval-lookup pattern, not a join.

A concrete example

Suppose you’re building a coding agent that gets called repeatedly on the same repo. Without memory, it rediscovers the repo’s conventions every session. With memory, the first session observes and writes:

/project/conventions: "This repo uses Tailwind v4 CSS-first config.
  No tailwind.config.js. All theme tokens in src/styles/global.css @theme block."
/project/stack: "Astro v5, TypeScript strict, MDX content collections."
/project/build_gotcha: "Content collections need regenerating after schema
  changes — run `npm run build` once to refresh types before `astro check`."

Every subsequent session, the model reads these at the start and doesn’t waste tool calls relearning. That compounds — after a month of sessions, the memory contains a curated, battle-tested picture of the project that’s better than most CLAUDE.md files because it’s been refined against real failures.

This is exactly the pattern behind Claude Code’s auto-memory system, by the way. Same idea, different UI.

Adaptive thinking

Extended thinking (added in 4.0) let you give the model a token budget to reason silently before responding. Useful, but annoying: you had to pick the budget manually, and the right budget varied wildly by task.

Adaptive thinking, new in 4.7, lets the model decide how much thinking a task warrants:

response = client.messages.create(
    model="claude-opus-4-7",
    thinking={
        "type": "adaptive",
        "max_budget_tokens": 30000,  # ceiling, not a target
    },
    ...
)

For a trivial request, the model might use 200 thinking tokens. For a hard refactor, it might use 25,000. You pay for what it actually used.

The practical implication: set max_budget_tokens high. Adaptive thinking’s whole value is that the ceiling no longer has to be a tight estimate. A 30,000-token ceiling doesn’t cost you 30,000 tokens on easy tasks — it costs you 200. It only bites when the task actually needs the depth, and when it does, you want the depth.

Putting the three together

The three features compose. A well-configured 4.7 agent loop looks like this:

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=8192,
    thinking={"type": "adaptive", "max_budget_tokens": 30000},
    task_budget={
        "max_tool_calls": 60,
        "max_thinking_tokens": 80000,
        "max_wall_clock_seconds": 1200,
    },
    tools=[
        {
            "type": "memory_20260401",
            "name": "memory",
            "config": {"storage": "anthropic_managed", "namespace": "agent-v1"},
        },
        # ... your domain tools
    ],
    system=SYSTEM_PROMPT,
    messages=[{"role": "user", "content": task}],
)

What you get:

The agent thinks as deeply as each sub-step warrants (adaptive thinking).
It tracks its own resource consumption and paces accordingly (task budgets).
It remembers what it learned about this project/user across turns and sessions (memory tool).
When it hits any limit, it returns a structured summary you can checkpoint.

This configuration is the default recommendation for any agent running on 4.7 that does more than one-shot lookups. There’s no good reason not to have all three on.

Pitfalls and things to watch

Memory tool leakage. The model will sometimes write to memory things that should not be persisted — transient task state, debug notes, or verbose reasoning. Audit your memory periodically. The memory_list tool lets you introspect; run it occasionally and prune.

Budget-induced shortcuts. When budgets are tight, the model skips verification steps (running tests, re-reading its work). Budgets too tight produce worse output than no budget. If you care about quality, the budget’s job is to prevent runaway, not to force efficiency.

Adaptive thinking cost variance. Because adaptive thinking adjusts per-task, your cost-per-invocation has higher variance. Average cost may be lower, but worst-case cost on a hard task can be 5–10x your average. Budget accordingly.

Memory across environments. If you’re using anthropic_managed memory, the namespace is the only isolation boundary. Don’t share a namespace across dev/staging/prod — you will regret it.

Getting started — 5-minute path

Update your SDK: pip install -U anthropic (needs the April 2026 release or later).
Change your model string to claude-opus-4-7.
Add thinking={"type": "adaptive", "max_budget_tokens": 30000} to one of your existing message calls. Verify it still works.
Add a task_budget with generous ceilings. Observe the budget_status in responses to calibrate.
Add the memory tool to a long-running agent you already have. Watch what it decides to remember over a day or two.

All four steps in the same afternoon. The lift is small; the behavioral change is not.

Where to go next

The Anthropic 4.7 announcement covers the benchmark numbers if you’re curious.
Claude Code 1.9+ already exposes task budgets in its planning UI — if you use Claude Code daily, you’re already touching this system whether you knew it or not.
The memory tool pairs very naturally with prompt caching. See the Prompt Caching primer — caching the memory-tool system prompt across invocations is where the cost savings compound.

The short version: if you run agents in production, 4.7 is the first model where the agent-loop ergonomics stop fighting you. Turn on all three features, set the ceilings high, and let it run.