Goodfire, a San Francisco startup, just released Silico, a tool that lets you crack open an LLM while it’s still training and fiddle with its parameters. That’s a bigger deal than it sounds. Most interpretability work so far has been post-mortem — you train a model, then try to figure out why it does what it does. Silico promises to let you intervene mid-process.
The company claims this is the first off-the-shelf tool that works across the entire development pipeline, from dataset construction to final training. CEO Eric Ho told MIT Technology Review that the dominant mindset at frontier labs is “more scale, more compute, more data, and then you get AGI and nothing else matters.” Goodfire’s bet is that there’s a better way.
I’ve been watching the mechanistic interpretability space for a while, and the gap between ambition and practical tooling has been frustrating. Anthropic, OpenAI, and Google DeepMind have all published impressive papers, but their techniques rarely leave the lab. Goodfire wants to change that by shipping actual software.
What Silico actually does
The tool lets you zoom in on individual neurons or groups of neurons inside a trained model, assuming you have access to its internals (so no, you can’t use this to debug ChatGPT, but it works with open-source models like Qwen 3). You can check what inputs fire specific neurons, trace pathways upstream and downstream, and run experiments to see how changes ripple through the network.
Goodfire found one neuron in Qwen 3 that was tied to the trolley problem. Activating it made the model frame everything as explicit moral dilemmas. “When this neuron’s active, all sorts of weird things happen,” Ho said. That kind of pinpointing is becoming standard in interpretability research, but Goodfire goes further: you can actually adjust the parameters connected to those neurons to boost or suppress specific behaviors.
One demo they showed me involved asking a model whether a company should disclose that its AI behaves deceptively in 0.3% of cases, affecting 200 million users. The model said no, citing negative business impact. By boosting neurons associated with transparency and disclosure, they flipped the answer from no to yes nine out of ten times. “The model already had the ethical reasoning circuitry, but it was being outweighed by the commercial risk assessment,” Ho explained.
The alchemy vs. engineering debate
Goodfire’s pitch is that they’re turning AI development from alchemy into precision engineering. Leonard Bereska, a researcher at the University of Amsterdam who works on mechanistic interpretability, isn’t buying it. “In reality, they are adding precision to the alchemy,” he told MIT Tech Review. “Calling it engineering makes it sound more principled than it is.”
I think Bereska has a point, but I also think Goodfire’s approach is still a meaningful step forward. The field has been drowning in theory without much practical tooling. Even imperfect tools that let developers intervene during training are better than the current black-box approach.
The agent twist
What makes Silico practical is that it uses AI agents to automate much of the interpretability work that previously required human researchers. “Agents are now strong enough to do a lot of the interpretability work that we were doing using humans,” Ho said. That’s the key enabler — without automation, this kind of analysis is too slow and expensive for real-world use.
What’s missing
Silico can also help steer training by filtering out data that would set unwanted parameter values in the first place. For example, they showed how models often think 9.11 is greater than 9.9 because of neurons associated with biblical verse numbering or code repository versioning. You could filter that data out during training to avoid the problem entirely.
But there are real limits. Silico only works on models where you have full access to weights and activations. Most commercial models are locked down. And the technique is still early — tweaking a few neurons can have unpredictable side effects. Goodfire is selling a scalpel, but the anatomy is still poorly understood.
Still, I’d rather have a scalpel than a sledgehammer. If Goodfire can keep shipping tools that make interpretability practical, they might actually narrow that gap between how well models are understood and how widely they’re deployed.
Comments (0)
Login Log in to comment.
Be the first to comment!