--- title: Running Google’s Gemma 4 Locally on macOS with Ollama date: 2026-04-07T06:00:00-04:00 author: John Morton canonical_url: "https://supergeekery.com/blog/running-googles-gemma-4-locally-on-macos-with-ollama" section: Blog --- # Running Google’s Gemma 4 Locally on macOS with Ollama *April 7, 2026* by John Morton ![Me coding remotely](https://static.supergeekery.com/site-assets/me-coding-remotely.jpg) *Audio narration available for this post.* Large language models don’t have to live in the cloud. With Ollama and a recent Mac, you can run Google’s Gemma 4 models entirely on your own machine. No API keys. No usage limits. No data leaving your laptop. Here’s how I set it up. ## **Installing Ollama** Ollama is the easiest way to run open-weight LLMs locally. On macOS, it’s a single Homebrew command: ```plaintext brew install ollama ``` Once installed, start the Ollama service: ```plaintext brew services start ollama ``` That’s it. Ollama is now running in the background and ready to pull and serve models. Any guesses on how to stop ollama? Easy! ```plaintext brew services stop ollama ``` ## **Pulling Gemma 4** Google’s Gemma 4 family comes in four sizes. The right choice depends on your available RAM and what you need the model for:

Model	Disk Space	Best For
gemma4:e2b	~7 GB	Fast responses, lightweight tasks
gemma4:e4b	~10 GB	Good balance for everyday use
gemma4:26b	~17 GB	Best quality/speed tradeoff
gemma4:31b	~20 GB	Highest quality, slower

A good rule of thumb: you’ll want at least as much free RAM as the model’s disk size, plus a few extra gigabytes for the system. The e2b and e4b variants run well on 16 GB machines. The 26b and 31b variants are more comfortable with 32 GB or more. I have a MacBook Pro with 64 GB of RAM, so I went with the 26b variant — best quality/speed tradeoff for my setup. If you’re not sure, starting with `gemma4:e2b` is a quick way to test things out before committing to a bigger download. ```plaintext ollama pull gemma4:26b ``` This downloads roughly 17 GB, so give it a few minutes depending on your connection. Once the download finishes, you can start chatting immediately: ```plaintext ollama run gemma4:26b ``` You’re now talking to a 26-billion parameter model running entirely on your hardware. ## **Making It a One-Word Command** I got tired of typing `ollama run gemma4:26b` every time, so I added a small function to my `.zshrc`: ```plaintext gemma() { ollama run gemma4:26b "$@" } ``` After running `source ~/.zshrc`, I can just type: ```plaintext gemma ``` Tada! I’m instantly in a conversation. If I want to send a quick one-off prompt without entering interactive mode: ```plaintext gemma "Explain the difference between concurrency and parallelism" ``` ## **Why a Local LLM?** Cloud-hosted models are great... *until they’re not*. 😬 Here are a few situations where having a local LLM pays off. ### **Sensitive documents stay sensitive** Some things shouldn’t leave your machine. Say you need to review an NDA or a contract. With a local model, you can pipe it straight in without worrying about where your data ends up. ```plaintext gemma "Summarize this contract in a markdown table with columns: Clause, What It Means, and Watch Out For" < nda.txt ``` By the way, this works well with **text** documents. Complex files like PDFs don't work based on my experience. What about proprietary code, internal docs, client data? Using a local LLM means none of it ever hits an external server. There’s no terms-of-service fine print to parse, no trust required. It just stays on your laptop. ### **No internet? No problem.** Once you’ve pulled a model, it’s on your disk. Airplane mode, spotty coffee shop Wi-Fi, working from a cabin with no signal — doesn’t matter. Your local LLM works the same whether you’re online or off. Where I live, we lose power and internet sometimes and having a local LLM comes in handy. ### **No credits, no quotas, no surprises** Cloud APIs charge per token. Usage caps reset monthly. Rate limits kick in at the worst possible time. A local model has none of that. Run as many prompts as you want, as often as you want. It’s your hardware and there’s no meter running. ## **What It’s Like** Response quality from the 26b model is genuinely impressive for a local setup. It handles coding questions, writing tasks, and general knowledge well. Responses start streaming within a few seconds on Apple Silicon, and the experience isn't as fast as you'd get from using a cloud-hosted model but it's still impressive. ## **Wrapping Up** The whole setup takes about five minutes: 1. `brew install ollama` 2. `brew services start ollama` 3. `ollama pull gemma4:26b` 4. Add the `gemma` shortcut to your shell config No accounts to create, no tokens to manage, no monthly bills. Just a local LLM ready whenever you are. In [Part 2](https://supergeekery.com/blog/upgrading-your-local-gemma-setup), I’ll show how I upgraded this basic setup with auto-starting Ollama, live streaming output, markdown rendering, and full conversation history — all within the same shell function. ## Related Posts - [Upgrading Your Local Gemma Setup](https://supergeekery.com/blog/upgrading-your-local-gemma-setup) - [What happens when the "Good, Fast, Cheap" rule breaks?](https://supergeekery.com/blog/good-fast-cheap-and-what-happens-when-the-rule-breaks) - [Craft CMS and chat-oriented programming, CHOP](https://supergeekery.com/blog/craft-cms-and-chat-oriented-programming-chop)