Deep Dives

Deep Dives

AI evals are now the real compute hog, and it’s getting worse

Evaluating AI models has quietly become more expensive than training them. A single agent benchmark...

5 0
Deep Dives

Granite 4.1 LLMs: A Deep Dive Into How IBM Actually Built Them

IBM's Granite 4.1 family (3B, 8B, 30B) is trained on ~15T tokens with a multi-stage...

5 0
Deep Dives

NVIDIA and Siemens Healthineers Want Ultrasound to Actually Hear You

NVIDIA and Siemens Healthineers release NV-Raw2Insights-US, an AI model that learns from raw ultrasound data...

6 0
Deep Dives

Training mRNA Language Models Across 25 Species for $165: What Actually Worked

OpenMed built an end-to-end protein-to-mRNA pipeline, training codon language models across 25 species for just...

9 0
Deep Dives

VAKRA: A Brutally Honest Look at How AI Agents Fail at Real-World Tasks

IBM's VAKRA benchmark tests AI agents on multi-step enterprise tasks with 8,000+ APIs. The results...

8 0
Deep Dives

QIMMA: The Arabic LLM Leaderboard That Actually Checks Its Homework

QIMMA is a quality-first Arabic LLM leaderboard that validates benchmarks before evaluating models, revealing systematic...

4 0
Deep Dives

Google’s TurboQuant Shrinks LLM Memory by 6x Without Killing Quality

Google Research's TurboQuant compression algorithm slashes LLM key-value cache memory by 6x and boosts speed...

10 0
Deep Dives

Testing LLMs on Superconductivity Research Questions

A new study from Google Research and Cornell University tests six LLMs on expert-level questions...

6 0
Deep Dives

TurboQuant: Google’s New Compression Trick That Actually Works

Google Research's new TurboQuant algorithm achieves extreme compression for large language models and vector search...

15 0
Deep Dives

Google Research Tries to Figure Out If LLMs Actually Behave Like Humans

Google Research built a framework to test whether LLMs' behavioral tendencies match human consensus. They...

8 0
Deep Dives

Forest vs. Tree: The Real Problem with AI Benchmarks and Human Raters

Google Research drops a new framework for figuring out the right balance between how many...

6 0
Deep Dives

MoGen: Using AI-Generated Fake Neurons to Make Brain Mapping Less Painful

Google Research's MoGen model generates synthetic neuron shapes to train AI reconstruction models, cutting errors...

6 0