Google’s been pushing Gemini hard, and the results are solid. But Gemini is Google’s playground, not yours. If you want to actually run something on your own hardware, you’ve been stuck with the Gemma series. Gemma 3 launched over a year ago, which in AI time is practically ancient history. Today, Google is finally dropping Gemma 4, and they’re doing something smart: they’re switching to the Apache 2.0 license.
That custom Gemma license always felt like a half-measure. It was open-ish, but with enough strings attached to make legal teams twitchy. Google admits they’ve heard the developer complaints, and they’re dumping it. Apache 2.0 is the real deal. No weird restrictions, no ambiguity. Just use it.
Four models are coming: two larger ones and two smaller ones optimized for local hardware. The big boys are a 26B Mixture of Experts and a 31B Dense model. Google claims the 26B MOE can run unquantized in bfloat16 on a single 80GB H100 GPU. That’s a $20,000 card, so “local” is relative. But if you’re willing to quantize down to lower precision, these should fit on consumer GPUs. That’s where the interesting work happens.
The MOE design is clever. It only activates 3.8 billion of its 26 billion parameters during inference. That means much higher tokens-per-second than a comparable dense model. The 31B Dense is the opposite: more focused on output quality than raw speed. Google expects developers to fine-tune that one for specific use cases. Makes sense. You pick your poison.
I’m more interested in the smaller variants. Google hasn’t detailed those yet, but if they follow the pattern of Gemma 3, we’ll get 2B and 7B models that actually run on a laptop. That’s where the real local AI revolution happens. Not everyone has an H100 lying around.
The latency improvements are worth noting too. Google specifically says they focused on reducing latency to take advantage of local processing. That’s a subtle jab at cloud-dependent models where every inference round-trips to a server. If Gemma 4 can deliver near-instant responses on device, that’s a genuine win for privacy and responsiveness.
One thing I don’t love: the timing. Gemma 3 launched over a year ago, and the open-source community has been running circles around Google with Llama, Mistral, and Qwen. Google’s playing catch-up here. But Apache 2.0 is a strong move. It removes the last excuse developers had to ignore Gemma.
No word yet on when the actual weights drop. Google says “starting today,” but that usually means a phased rollout. Keep an eye on their model hub. If they follow through quickly, this could shake up the open model landscape. If not, well, the community will keep building on what’s already out there.
Either way, credit where it’s due: Google finally did the right thing on licensing. Took them long enough.
Comments (0)
Login Log in to comment.
Be the first to comment!