Claude Opus 4.7 lands in the UK: the benchmark, the cost trap, and the playbook

When we first published this piece in late November 2025, Claude Opus 4.5 was the headline. Five months later that framing is obsolete. Anthropic has shipped two further Opus releases — 4.6 in February 2026 and 4.7 on 16 April 2026 — and the gap between teams running the current model and teams still quoting 4.5 benchmarks in sales decks is now the real competitive edge. If you are a UK founder, agency owner or in-house marketer trying to work out what to actually do in the week after the 4.7 launch, this is the version that matters.

From Opus 4.5 to Opus 4.7: what actually shipped between November and April

The original post framed Opus 4.5 as a step-change over Gemini 3 Pro. That was fair in November. Since then:

  • Opus 4.6 (5 February 2026) added a 1M token context window in beta, sharper multi-file code review, longer sustained agentic loops and — crucially in Claude Code — the ability to assemble agent teams that work on a task in parallel.
  • Opus 4.7 (16 April 2026) shipped with stronger software engineering, high-resolution vision (up to 2,576 px / 3.75 MP), sharper instruction following, and what Anthropic calls task budgets: a beta feature that lets developers hand Claude a rough token target for a full agentic run so it prioritises work across long horizons.

Two other 4.7-era shipments matter for UK teams. First, the xhigh effort level for hardest-tier reasoning tasks. Second, the /ultrareview command inside Claude Code, which replaces the kind of manual review loop most eng leads were bolting on themselves. Pricing is unchanged on paper — $5 per million input, $25 per million output — and that 'unchanged' headline is where the first serious UK procurement mistake is being made. More on that below.

Opus 4.7 by the numbers: 87.6% SWE-bench Verified and a 3x production uplift

Benchmarks moved materially in one quarter. The original piece cited 25% review-time and 18% sprint-velocity gains from pilot data. With 4.7 on the table, the numbers to put in front of a UK CTO are these:

  • 87.6% on SWE-bench Verified, up from 80.8% on Opus 4.6
  • SWE-bench Pro: 64.3%, up from 53.4% — an 11-point jump on the harder, enterprise-flavoured set
  • CursorBench: 70%, up from 58%
  • Rakuten's internal benchmark: 3x more production task resolutions versus Opus 4.6
  • Agentic work: +14% capability at fewer tokens used and roughly one-third the tool errors compared with 4.6

In plain English, Opus 4.7 is the first Claude that reliably keeps executing through tool failures that used to halt a run. For anyone deploying agents against real UK systems — HMRC filing flows, Companies House lookups, GA4 APIs, Shopify back-ends — that single property is the difference between an 'AI pilot' and something you can leave running overnight.

The tokenizer trap: why 'same price' is up to 35% more expensive

Here is the detail buried under the launch-day headlines. Opus 4.7 ships with a new tokenizer that produces up to 35% more tokens from the same input text compared with Opus 4.6. The multiplier varies between 1.0x and 1.35x depending on content type, with code, structured data and non-English strings sitting at the heavier end.

What that means for a UK buyer: the per-token list price is flat, but the actual bill on an identical workload can rise by a third before anyone notices. Finance teams running a March-to-April comparison will see cost-per-request spike and (wrongly) blame usage growth. The correct response is to re-baseline unit economics for 4.7 before you roll it out, not after. If you are quoting clients on AI work, your old cost-per-deliverable model for Opus 4.5 is simply wrong — rebuild it.

Task budgets, 1M context and agent teams: the features UK buyers should weaponise

Three 2026-era capabilities are where practical advantage lives:

  1. 1M-token context (from 4.6 onwards). Entire UK regulatory documents — the DSIT AI assurance framework, the ICO data protection code, an HMRC R&D tax guidance bundle — fit in a single prompt without retrieval gymnastics. This is the feature most UK agencies have not yet re-architected around.
  2. Task budgets (4.7 beta). Instead of hoping your agent does not spiral, you hand it a token allowance for a full job. This is what turns 'agentic demos' into a line item a finance director will sign off. Opus 4.7 is also available on Amazon Bedrock from launch day, which removes the last procurement excuse for UK enterprises already standardised on AWS.
  3. Agent teams in Claude Code. Parallel sub-agents working on one task is the pattern that collapses a two-week sprint into three days — when the supervising human knows what to ship to which agent.

What this means for UK marketing and operations teams

The original piece was aimed at engineering leaders. That was too narrow. By April 2026 the sharper question for UK marketing and operations teams is: what is the smallest team, running the newest model, that can deliver what used to take a 12-person squad?

  • Your media plan, creative, analytics and reporting can now share one 1M-token context. No more copy-pasting between Looker, Notion and a brief doc.
  • Task budgets mean you can give an agent a fixed-cost brief ('produce this month's campaign analysis for no more than £X of compute') and it will actually hold the line.
  • The 3x production-task uplift from Rakuten's data is the number to benchmark against internally — if your AI workflows are not delivering at least a 2x throughput lift after moving from 4.5-era pipelines to 4.7-era pipelines, your prompts and tooling are the bottleneck, not the model.
  • The tokenizer shift is a pricing-model problem before it is a technical problem. UK agencies still quoting fixed-fee AI retainers built on November 2025 economics are underbidding themselves by up to a third.

This is the shift that makes a Anjin Marketing Operating System — not another point tool — the right response.

Anjin: the Marketing Operating System built for the Opus 4.7 era

Anjin is the Marketing Operating System designed for the reality we now live in: a frontier model that ships a new version every 8–12 weeks, a tokenizer that can silently re-price your entire cost base, and agentic loops that need supervision, budgets and a shared context across brief, creative and analytics.

Anjin wraps the current generation of Claude — Opus 4.7 today, whatever ships next in July — into a single Marketing OS: one context for strategy, creative, media, analytics and reporting; budgeted agent runs so finance can sign off ahead of time; a prompt and workflow library already tuned for the 4.7 tokenizer. You stop rebuilding your stack every quarter. You stop quoting work on last quarter's economics. You compound.

Agencies are our launch audience because they feel the model-upgrade whiplash first, but the OS is built for any UK team running marketing at AI speed — founder-led SaaS, lean in-house teams, operators spun out of the big networks.

The £888 Lifetime License — Offer Closing Soon

Lifetime access to Anjin for a one-time payment of £888. Not a subscription. Not a seat. Not a trial. One payment, unlimited use, for as long as Anjin exists.

The average marketing team spends £888 in about three working days on tooling, freelancers and coordination software. You're buying the platform that replaces most of it — once.

This price will not be offered again once we close our early-access cohort.

Claim your £888 Anjin lifetime license →

Founders, agency owners and in-house marketers — this is how you run marketing at AI speed without the team, the burn, or another year of waiting.

Sources: Anthropic — Opus 4.7, Anthropic — Opus 4.6, AWS Bedrock launch, GitHub Changelog, Anthropic API docs, Vellum benchmarks, BuildFastWithAI review, The Next Web, Finout pricing analysis, CloudZero pricing, TechCrunch — agent teams, SQ Magazine UK

Continue reading