---
title: Running Google’s Gemma 4 Locally on macOS with Ollama
date: 2026-04-07T06:00:00-04:00
author: John Morton
canonical_url: "https://supergeekery.com/blog/running-googles-gemma-4-locally-on-macos-with-ollama"
section: Blog
---
# Running Google’s Gemma 4 Locally on macOS with Ollama

*April 7, 2026* by John Morton

![Me coding remotely](https://static.supergeekery.com/site-assets/me-coding-remotely.jpg)

*Audio narration available for this post.*

Large language models don’t have to live in the cloud. With Ollama and a recent Mac, you can run Google’s Gemma 4 models entirely on your own machine. No API keys. No usage limits. No data leaving your laptop.

Here’s how I set it up.

## **Installing Ollama**

Ollama is the easiest way to run open-weight LLMs locally. On macOS, it’s a single Homebrew command:

```plaintext
brew install ollama
```

Once installed, start the Ollama service:

```plaintext
brew services start ollama
```

That’s it. Ollama is now running in the background and ready to pull and serve models.

Any guesses on how to stop ollama? Easy!

```plaintext
brew services stop ollama
```

## **Pulling Gemma 4**

Google’s Gemma 4 family comes in four sizes. The right choice depends on your available RAM and what you need the model for:

<table><tbody><tr><td>**Model**</td><td>**Disk Space**</td><td>**Best For**</td></tr><tr><td>gemma4:e2b</td><td>~7 GB</td><td>Fast responses, lightweight tasks</td></tr><tr><td>gemma4:e4b</td><td>~10 GB</td><td>Good balance for everyday use</td></tr><tr><td>gemma4:26b</td><td>~17 GB</td><td>Best quality/speed tradeoff</td></tr><tr><td>gemma4:31b</td><td>~20 GB</td><td>Highest quality, slower</td></tr></tbody></table>

A good rule of thumb: you’ll want at least as much free RAM as the model’s disk size, plus a few extra gigabytes for the system. The e2b and e4b variants run well on 16 GB machines. The 26b and 31b variants are more comfortable with 32 GB or more.

I have a MacBook Pro with 64 GB of RAM, so I went with the 26b variant — best quality/speed tradeoff for my setup. If you’re not sure, starting with `gemma4:e2b` is a quick way to test things out before committing to a bigger download.

```plaintext
ollama pull gemma4:26b
```

This downloads roughly 17 GB, so give it a few minutes depending on your connection. Once the download finishes, you can start chatting immediately:

```plaintext
ollama run gemma4:26b
```

You’re now talking to a 26-billion parameter model running entirely on your hardware.

## **Making It a One-Word Command**

I got tired of typing `ollama run gemma4:26b` every time, so I added a small function to my `.zshrc`:

```plaintext
gemma() {
    ollama run gemma4:26b "$@"
}
```

After running `source ~/.zshrc`, I can just type:

```plaintext
gemma
```

Tada! I’m instantly in a conversation.

If I want to send a quick one-off prompt without entering interactive mode:

```plaintext
gemma "Explain the difference between concurrency and parallelism"
```

## **Why a Local LLM?**

Cloud-hosted models are great... *until they’re not*. 😬 Here are a few situations where having a local LLM pays off.

### **Sensitive documents stay sensitive**

Some things shouldn’t leave your machine. Say you need to review an NDA or a contract. With a local model, you can pipe it straight in without worrying about where your data ends up.

```plaintext
gemma "Summarize this contract in a markdown table with columns: Clause, What It Means, and Watch Out For" < nda.txt
```

By the way, this works well with **text** documents. Complex files like PDFs don't work based on my experience.

What about proprietary code, internal docs, client data? Using a local LLM means none of it ever hits an external server. There’s no terms-of-service fine print to parse, no trust required. It just stays on your laptop.

### **No internet? No problem.**

Once you’ve pulled a model, it’s on your disk. Airplane mode, spotty coffee shop Wi-Fi, working from a cabin with no signal — doesn’t matter. Your local LLM works the same whether you’re online or off. Where I live, we lose power and internet sometimes and having a local LLM comes in handy.

### **No credits, no quotas, no surprises**

Cloud APIs charge per token. Usage caps reset monthly. Rate limits kick in at the worst possible time. A local model has none of that. Run as many prompts as you want, as often as you want. It’s your hardware and there’s no meter running.

## **What It’s Like**

Response quality from the 26b model is genuinely impressive for a local setup. It handles coding questions, writing tasks, and general knowledge well. Responses start streaming within a few seconds on Apple Silicon, and the experience isn't as fast as you'd get from using a cloud-hosted model but it's still impressive.

## **Wrapping Up**

The whole setup takes about five minutes:

1. `brew install ollama`
2. `brew services start ollama`
3. `ollama pull gemma4:26b`
4. Add the `gemma` shortcut to your shell config

No accounts to create, no tokens to manage, no monthly bills. Just a local LLM ready whenever you are.

In [Part 2](https://supergeekery.com/blog/upgrading-your-local-gemma-setup), I’ll show how I upgraded this basic setup with auto-starting Ollama, live streaming output, markdown rendering, and full conversation history — all within the same shell function.

## Related Posts

- [Upgrading Your Local Gemma Setup](https://supergeekery.com/blog/upgrading-your-local-gemma-setup)
- [What happens when the &quot;Good, Fast, Cheap&quot; rule breaks?](https://supergeekery.com/blog/good-fast-cheap-and-what-happens-when-the-rule-breaks)
- [Craft CMS and chat-oriented programming, CHOP](https://supergeekery.com/blog/craft-cms-and-chat-oriented-programming-chop)
