/images/avatar.png

Gemma 4 Is Here — Google's Newest Open-Weight Models

Google DeepMind released Gemma 4 recently — April 2, 2026.

I’ve been scrolling through tech news and forums waiting for something new to come up. Finally, it dropped. I’ve been tracking the open-weight LLM space closely, and this is also one of the open-weight models that I’ve been waiting for and can’t wait to talk about. Four new models dropped on April 2, all Apache 2.0:

  • E2B (effective ~2B) — tiny, efficient, runs on almost anything
  • E4B (effective ~4.5B) — the sweet spot for laptops and phones
  • 26B MoE (4B active) — high quality, fast inference
  • 31B dense — the big one, maximum capability

And yes, they’re all multimodal now. Images, video, audio — built in from the start, not bolted on later.

Meet Qwen 3.5. Your Local AI Just Got Serious.

The open-weight LLM scene has been moving fast lately — but most of the noise is just bigger parameter counts chasing diminishing returns. What’s actually interesting right now isn’t about how massive a model can get, but how much capability we’re packing into something that runs on consumer hardware.

Enter Qwen 3.5, which Alibaba released in February with two variants designed for exactly this moment: the 27B dense model and 35B-A3B MoE. These aren’t trying to be GPT-5 replacements. They’re asking a different question entirely — what if you could run frontier-level reasoning locally without needing an API key or worrying about token costs?

Running Open Weight Models On A Single Consumer Grade GPUs

Why Open Models?

For years, the biggest language and vision systems were locked behind corporate APIs — from OpenAI, Antrhopic, Google etc.

Then DeepSeek came in — DeepSeek is one of the pioneers in open model space. a relatively unknown AI research lab from China, released an open source model that quickly become the talk back then. On many metrics that matter — capability, cost, openness ― DeepSeek is opening the way for open weight models in the industry.