Deep Dives
AI evals are now the real compute hog, and it’s getting worse
Evaluating AI models has quietly become more expensive than training them. A single agent benchmark...
Granite 4.1 LLMs: A Deep Dive Into How IBM Actually Built Them
IBM's Granite 4.1 family (3B, 8B, 30B) is trained on ~15T tokens with a multi-stage...
NVIDIA and Siemens Healthineers Want Ultrasound to Actually Hear You
NVIDIA and Siemens Healthineers release NV-Raw2Insights-US, an AI model that learns from raw ultrasound data...
Training mRNA Language Models Across 25 Species for $165: What Actually Worked
OpenMed built an end-to-end protein-to-mRNA pipeline, training codon language models across 25 species for just...
VAKRA: A Brutally Honest Look at How AI Agents Fail at Real-World Tasks
IBM's VAKRA benchmark tests AI agents on multi-step enterprise tasks with 8,000+ APIs. The results...
QIMMA: The Arabic LLM Leaderboard That Actually Checks Its Homework
QIMMA is a quality-first Arabic LLM leaderboard that validates benchmarks before evaluating models, revealing systematic...
Google’s TurboQuant Shrinks LLM Memory by 6x Without Killing Quality
Google Research's TurboQuant compression algorithm slashes LLM key-value cache memory by 6x and boosts speed...
Testing LLMs on Superconductivity Research Questions
A new study from Google Research and Cornell University tests six LLMs on expert-level questions...
TurboQuant: Google’s New Compression Trick That Actually Works
Google Research's new TurboQuant algorithm achieves extreme compression for large language models and vector search...
Google Research Tries to Figure Out If LLMs Actually Behave Like Humans
Google Research built a framework to test whether LLMs' behavioral tendencies match human consensus. They...
Forest vs. Tree: The Real Problem with AI Benchmarks and Human Raters
Google Research drops a new framework for figuring out the right balance between how many...
MoGen: Using AI-Generated Fake Neurons to Make Brain Mapping Less Painful
Google Research's MoGen model generates synthetic neuron shapes to train AI reconstruction models, cutting errors...