AGEofLLMs.com
Search

Llama 4 Is Here and It's Not Messing Around

Calculating... Comments
  • Meta released Llama 4 Scout and Maverick open-weight models
  • Scout runs up to 10 million tokens Maverick hits 1 million
  • Behemoth with 2 trillion parameters is still in training
  • Mixture of Experts is now standard
  • You’ll need a monster GPU to run them locally
  • License limits usage by massive companies
Llama 4 website screenshot
Llama 4 website screenshot

Llama 4 is out and yeah it’s a lot. Meta just dropped two wild open-weight models with two more on the way. And the specs are kinda bonkers.

Llama 4 Scout. This thing eats context for breakfast.

  • 17B active parameters but behind the scenes it's rocking 110B total
  • 10 million token context
  • Multimodal—reads text images even full video
  • Runs on a single GPU... if you’ve got an H100 with enough juice
  • Crazy good at digging up info from massive chunks of data

Not great for coding though based on early benchmarks.

Llama 4 Maverick. This is the beast for regular tasks.

  • Same 17B active but 400B total parameters
  • 1 million token context
  • Also multimodal
  • Ranks #2 in Chatbot Arena right behind Gemini 2.5 Pro
  • Outperforms GPT-4, Gemini Flash 2, Grok 3 in user votes

And it’s efficient too. Costs less to run than other big names and it’s still a top performer.

Llama 4 is Meta’s first real attempt at what’s called a mixture of experts model—basically the AI is made up of different parts that handle different tasks depending on what you’re asking it to do. At first Meta held the release back because the models weren't doing so hot in math and logic problems but switching to this new mixture setup made a big difference. They’re training these models on a monster setup of over 100,000 H100 GPUs which might be the biggest publicly known group of AI machines anyone’s using right now. As for what the models can handle—Scout can work with inputs up to 10 million words or tokens while Maverick handles up to 1 million. That means they can read and understand a ton of info at once. But there’s a catch. Running these models isn’t something you can just do on a regular laptop. You’ll need at least one high-end GPU like an Nvidia H100 and if you want full performance you might need more than one. The newer B200 GPUs are even faster—they can run Scout nearly three and a half times quicker than an H100 pushing out around 40,000 words per second.

What’s Coming

Llama 4 Reasoning. More logic and problem-solving skills—soon.

Llama 4 Behemoth.

  • 2 trillion total parameters
  • 288B active
  • Not done training yet
  • Already beating GPT-4.5 and Claude Sonnet on STEM tasks

This could end up being the biggest base model out there.

    So can you use them?

    Yep—unless your app has more than 700 million monthly users. Then you gotta ask Meta for permission. And you have to put “Built with Meta” somewhere.

    For the rest of us?

    • Try them on Meta.ai in Messenger WhatsApp or Instagram or on Replicate
    • Use Together.ai or Grok playground
    • Download weights from Hugging Face if you've got GPU power to spare

    What’s it all mean

    • Mixture of Experts is the new default
    • Long context is the new flex
    • Open weight doesn’t mean totally free—but it’s still way more open than closed labs
    • Behemoth might shake the whole leaderboard when it lands

    Feels like Meta’s tired of playing second fiddle. They're going big and weird with Llama 4 and honestly it’s working.

    Last modified 07 April 2025 at 15:26

    Related Posts

    Visitor Comments

    Please prove you are human by selecting the star.