ReasoningBank: Giving Agents a Memory That Actually Learns from Mistakes

Agents are getting good at navigating web pages, fixing code, and handling complex workflows. But here’s the dirty secret: most of them don’t learn from their mistakes after they’re deployed. They treat every new task like a blank slate, repeating the same errors and discarding hard-won insights.

Google Cloud’s research team just dropped a paper at ICLR that tackles this head-on. ReasoningBank is a memory framework that doesn’t just store what happened — it distills why things worked or failed into reusable reasoning patterns. And yes, it actually uses failures as teaching moments, which is more than most existing approaches bother to do.

The problem with current agent memory

Most agent memory systems today fall into two camps. One camp saves exhaustive logs of every action taken — Synapse’s trajectory memory is a good example. The other camp documents workflows only from successful attempts, like Agent Workflow Memory.

Both have serious blind spots. Recording every click and keystroke gives you a firehose of data but no strategic insight. You end up with a detailed history of what happened, but no understanding of why that approach worked or what general principle to apply next time. And by only learning from wins, you’re ignoring a goldmine of information — your own failures. Real learning, whether for humans or AI, comes from screwing up and figuring out why.

How ReasoningBank works

The core idea is refreshingly practical. Instead of storing raw action logs, ReasoningBank creates structured memory items with three components: a title (what’s this about?), a description (brief summary), and content (the actual reasoning steps or decision rationales).

Here’s the workflow loop:

Before an agent acts, it queries the ReasoningBank for relevant memories. It then interacts with the environment. After the action, an LLM acts as a judge to self-assess the trajectory — extracting success insights or failure reflections. Notably, the paper found that this self-judgment doesn’t need to be perfectly accurate. The system is robust against noise, which is good because LLMs aren’t exactly known for perfect self-evaluation.

From each trajectory, the agent distills workflows and generalizable insights into new memories. For now, these get appended directly to the bank. The authors acknowledge more sophisticated consolidation strategies could come later, but the simple approach already works well.

Learning from failure is the real differentiator

This is where ReasoningBank stands apart. Most workflow memory systems only process successful runs. ReasoningBank actively analyzes failures to extract counterfactual signals and pitfalls.

Consider the difference. A naive system might learn: “Click the ‘Load More’ button.” A ReasoningBank-powered agent, after failing to load more content, might learn: “Always verify the current page identifier first to avoid infinite scroll traps before attempting to load more results.”

That’s a strategic guardrail, not just a procedural rule. It’s the kind of insight that prevents future failures across different but related scenarios.

Results that back up the hype

On web browsing and software engineering benchmarks, ReasoningBank outperformed baseline approaches on two fronts: higher success rates and fewer steps per task. That’s the sweet spot — better outcomes with less wasted effort.

The efficiency gain makes sense. When an agent has distilled reasoning patterns rather than raw logs, it spends less time re-exploring dead ends and more time executing proven strategies.

What this means for the agent ecosystem

This approach signals a shift in how we think about agent memory. We’re moving from “memory as storage” to “memory as learning system.” The distinction matters because deployed agents don’t get the luxury of retraining every time they encounter a novel situation. They need to adapt on the fly.

ReasoningBank isn’t perfect. The current implementation doesn’t have sophisticated memory consolidation, and the reliance on LLM-as-judge introduces some overhead. But the core insight — that agents should learn from both success and failure, and that learning should be strategic rather than tactical — feels like the right direction.

I’d love to see this integrated into production agent frameworks. The code is already on GitHub, so there’s no excuse for the major agent platforms to ignore this. If you’re building agents that run for more than a few minutes, give ReasoningBank a look. Your agents might finally start learning from their mistakes.

Paper: “ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory” at ICLR 2026. Code available on GitHub.