Claude Opus 4.7: The Model That Actually Finishes the Hard Stuff

Anthropic just dropped Claude Opus 4.7, and honestly, this one feels like a real step forward—not just another incremental bump. It’s generally available now across all Claude products, the API, Bedrock, Vertex AI, and Microsoft Foundry. Pricing stays the same as Opus 4.6: $5 per million input tokens, $25 per million output tokens.

What’s Actually Different

The headline improvement is in software engineering, especially for the nasty, multi-step problems that previously required you to babysit the model. Users are reporting they can now hand off their hardest coding work—the kind that used to need close supervision—and actually trust Opus 4.7 to get it done. It pays closer attention to instructions, verifies its own outputs before reporting back, and just generally doesn’t flake out on long-running tasks.

Vision got a real upgrade too. Higher resolution support means it can actually read chemical structures and complex technical diagrams now, not just guess. And it’s noticeably more tasteful and creative with professional outputs—interfaces, slides, docs. That matters more than benchmarks suggest.

The Cyber Safeguard Angle

This is where it gets interesting. Last week Anthropic announced Project Glasswing about AI cybersecurity risks and benefits. Opus 4.7 is the first model shipping with new safeguards that automatically detect and block prohibited or high-risk cybersecurity requests. Its cyber capabilities are intentionally less advanced than Claude Mythos Preview (they actually trained to reduce those capabilities).

If you’re a legitimate security professional doing vulnerability research or penetration testing, there’s a new Cyber Verification Program to get access. It’s a reasonable approach—test the guardrails on a less capable model before rolling them out wider.

What Early Testers Are Saying

I’ve been digging through the early access feedback, and it’s unusually positive for a point release:

Hex says it’s the strongest model they’ve evaluated. It correctly reports missing data instead of making up plausible-but-wrong answers, and resists those dissonant-data traps that even Opus 4.6 falls for. Their take: “low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6.” That’s a meaningful efficiency gain.

Cognition (Devin) reports it works coherently for hours, pushes through hard problems instead of giving up, and unlocks “a class of deep investigation work we couldn’t reliably run before.” That’s the kind of long-horizon autonomy people have been chasing.

Replit called it an easy upgrade decision. On their 93-task coding benchmark, resolution went up 13% over Opus 4.6, including four tasks neither Opus 4.6 nor Sonnet 4.6 could solve. Combined with faster median latency and strict instruction following, that’s a real workflow improvement.

Sourcegraph (Cody) noted it catches its own logical faults during planning and accelerates execution. For a financial tech platform serving millions, that combination of speed and precision is potentially game-changing.

Vercel (v0) says it’s the state-of-the-art coding model on the market now, especially for real-world async workflows—automations, CI/CD, long-running tasks. They also note it thinks more deeply and brings a more opinionated perspective rather than just agreeing with you.

The Benchmark Picture

On their internal research-agent benchmark, Opus 4.7 scored 0.715 overall, tied for top. On General Finance, 0.813 versus Opus 4.6’s 0.767. Deductive logic, where Opus 4.6 struggled, is now solid. And it shows the best disclosure and data discipline in the group.

My Take

Look, I’ve been around long enough to be skeptical of model release hype. But the pattern in these tester quotes is consistent: it’s not just faster or more accurate, it’s more reliable on the stuff that actually matters. That self-verification behavior—catching its own logical faults during planning—is the kind of meta-cognition that separates useful tools from frustrating ones.

The cyber safeguards are a smart move, even if they’ll annoy some power users. Testing guardrails on Opus 4.7 before rolling them out to Mythos-class models is exactly the right approach. And keeping pricing flat while delivering real improvements? That’s rare these days.

Is it going to replace Claude Mythos Preview? No, and it’s not supposed to. But for day-to-day coding work, long-running agents, and anything that requires sustained reasoning over hours, Opus 4.7 looks like the model to beat right now.