---
title: Upgrading Your Local Gemma Setup
date: 2026-04-09T06:00:00-04:00
author: John Morton
canonical_url: "https://supergeekery.com/blog/upgrading-your-local-gemma-setup"
section: Blog
---
# Upgrading Your Local Gemma Setup

*April 9, 2026* by John Morton

![Coding remotely with glitter](https://static.supergeekery.com/site-assets/coding-remotely-with-glitter.jpg)

*Audio narration available for this post.*

Note that the audio version of this post omits many long blocks of code. Reference the written blog post to access the code blocks.

In [Part 1](https://supergeekery.com/blog/running-googles-gemma-4-locally-on-macos-with-ollama), I set up Google's Gemma 4 running locally on macOS with Ollama and created a simple `gemma` shell function. It worked, but it was bare-bones — no auto-start, no markdown rendering, and the interactive mode was just a pass-through to `ollama run`.

Here's how I turned that basic shortcut into something that feels like a polished chat interface, all within a single shell function.

## **The Problem with Plain Text**

Gemma's responses are markdown — headers, code blocks, tables, lists. But in a raw terminal, a markdown table looks like this:

```plaintext
| Column A | Column B |
|---|---|
| value | value |
```

Not exactly easy to scan. I wanted rendered tables with proper borders, highlighted code blocks, and headers with visual weight. Enter Glow.

## **Pretty-Printing with Glow**

[Glow](https://github.com/charmbracelet/glow) is a terminal markdown renderer from Charm. Install it with Homebrew:

```plaintext
brew install glow
```

The simplest upgrade is piping one-off prompts through Glow:

```plaintext
gemma() {
    # ... existing function ...

    if [[ $# -gt 0 ]] && command -v glow > /dev/null 2>&1; then
        ollama run gemma4:26b "$@" | sed $'s/\033\[[0-9;]*[a-zA-Z]//g' | glow
    else
        ollama run gemma4:26b "$@"
    fi
}
```

Now `gemma "Explain the difference between concurrency and parallelism"` gives a nicely formatted response with rendered headers, highlighted code blocks, and properly drawn tables — right in the terminal.

The `command -v glow` check means if Glow isn't installed, everything still works normally. It's a pure upgrade with no downside.

One gotcha: `ollama run` outputs ANSI escape codes for terminal cursor control. These are harmless in a normal terminal, but Glow treats them as literal text — you'll see artifacts like `[2D[K` scattered through the output. The `sed` command strips those escape sequences before Glow sees them.

## **Auto-Starting Ollama**

One annoyance with the basic setup: if you forget to start the Ollama service, the command just fails. Easy fix — check whether the server is reachable and start it automatically:

```plaintext
if ! curl -sf http://localhost:11434/ > /dev/null 2>&1; then
    echo "Starting Ollama service..."
    brew services start ollama
    while ! curl -sf http://localhost:11434/ > /dev/null 2>&1; do
        sleep 1
    done
fi
```

This pings the Ollama API endpoint on `localhost:11434`. If there's no response, it runs `brew services start ollama` and waits until the server is ready before proceeding. If Ollama is already running, it skips straight to the model with no delay.

## **Interactive Mode: The Hard Part**

The one-off piping trick doesn't work for interactive conversations — piping stdout through Glow breaks the interactive input. So I took it a step further and built a custom interactive mode using the Ollama API directly.

The goals:

1. **Conversation history within a session** — once I start a chat, the model should remember everything I've said in that chat until I exit. (No persistence across sessions — each new `gemma` invocation starts fresh.)
2. **Visible progress feedback** — a spinner so I know the model is working, not hung, because it can take a bit of time for harder requests
3. **Markdown rendering** — render the final response through Glow
4. **Graceful interrupts** — Ctrl-C should exit cleanly, not leave orphaned processes

### **Conversation History with the Chat API**

Instead of shelling out to `ollama run`, the function calls the `/api/chat` endpoint with `curl`. This endpoint accepts a `messages` array, which means we can maintain the full conversation history for the duration of the chat. I store the messages as JSON in a temp file created with `mktemp` when the session starts, append each new exchange with `jq`, and the cleanup function deletes that file when you exit — so history lives exactly as long as the interactive session does:

```plaintext
# Add user message to history
jq --arg p "$prompt" \
    '. + [{"role": "user", "content": $p}]' "$tmpfile" > "$tmpfile.tmp" \
    && mv "$tmpfile.tmp" "$tmpfile"

# Send full history to the API with stream:false
curl -sS http://localhost:11434/api/chat -d "$(jq -n \
    --argjson msgs "$(cat "$tmpfile")" \
    '{"model": "gemma4:26b", "messages": $msgs, "stream": false}')"
```

You'll need `jq` for the JSON handling — it comes pre-installed on recent macOS versions, or you can grab it with `brew install jq`.

### **A "Thinking..." Spinner While You Wait**

A 26B-parameter model on a laptop takes several seconds per response, and during that time the terminal needs to look alive. (In other words, it's slower than you may expect if you've used Claude, Gemini, or ChatGPT.) The trick is to send `"stream": false`, run `curl` in the background, and animate a spinner in the foreground until the response comes back — then render the whole thing through Glow exactly once. The same pattern is reused in one-shot mode, so both code paths feel identical.

```plaintext
response_file=$(mktemp)
local _payload=$(jq -n --argjson msgs "$(cat "$tmpfile")" \
    '{"model": "gemma4:26b", "messages": $msgs, "stream": false}')
curl -sS http://localhost:11434/api/chat -d "$_payload" > "$response_file" 2>&1 &
local _curl_pid=$!
local _spinner='|/-\' _i=0
printf '\033[?25l'
while kill -0 "$_curl_pid" 2>/dev/null; do
    printf '\r\033[2;90mThinking... %s\033[0m' "${_spinner:_i++%${#_spinner}:1}"
    sleep 0.1
done
wait "$_curl_pid"
printf '\r\033[K\033[?25h'
```

Once the spinner clears, the response gets pulled out of the JSON with `jq` and rendered with Glow:

```plaintext
content=$(jq -r '.message.content // empty' "$response_file" 2>/dev/null)
if $has_glow; then
    printf '%s' "$content" | glow
else
    printf '%s\n' "$content"
fi
```

Why bother with the HTTP API instead of just `ollama run gemma4:26b "$prompt" | glow`? `ollama run` buffers everything until the model is done (no room for a spinner) and emits TUI escape codes that show up as garbage in Glow even when redirected. The HTTP API gives you clean JSON and decoupled control over the UI.

### **Graceful Interrupt Handling**

The trickiest part was getting Ctrl-C to work properly. A naive `trap ... EXIT` in a shell function applies to the entire shell session, not just the function. The fix is a cleanup function that resets the trap after running:

```plaintext
_gemma_cleanup() {
    printf '\033[0m'
    rm -f "$tmpfile" "$tmpfile.tmp" "$response_file"
    trap - INT TERM
}
trap '_gemma_cleanup; return' INT TERM
```

This ensures Ctrl-C kills the in-flight request, resets the terminal color, cleans up temp files, and returns you to a working prompt.

### **File References with @filename**

In one-shot mode, you can pipe text files in with standard redirection (`gemma "explain this" < file.sh`). But in interactive mode, stdin is already being used for your input. And piping in a PDF? That sends raw binary to the model and it chokes.

So I added `@filename` syntax — prefix any file path with `@` and the function reads its contents into the prompt before sending it to the model. It works in both one-shot mode:

```plaintext
gemma "explain this script @warm_cache.sh"
```

and interactive mode:

```plaintext
>>> compare these two files @old_version.py @new_version.py
>>> what does @src/utils.js do?
```

The function uses zsh's `${(z)...}` operator to split the prompt into words, then checks each one for a leading `@`. If the file exists, its contents are injected inline, wrapped with markers so the model knows where the file starts and ends.

You need to use the full path to the file. If the file isn't found, the function tells you and stops before sending anything to the model — in both one-shot and interactive mode.

```plaintext
$ gemma "what does @sample.pdf say?"
File not found: sample.pdf
Use the full path, e.g. @/Users/john/Downloads/sample.pdf
```

#### **PDF Support**

For PDFs, `@filename` automatically extracts the text using `pdftotext` (from the `poppler` package) instead of sending raw binary:

```plaintext
brew install poppler
```

With poppler installed, you can reference PDFs just like any other file:

```plaintext
gemma "summarize @/Users/john/Downloads/contract.pdf"
Reading PDF: contract.pdf... (this may take a moment)
```

The function detects the `.pdf` extension, prints a status message so you know it's working, and runs `pdftotext` to convert it to plain text before injecting it into the prompt. PDFs can be large, and the model needs time to process all that text — the status message prevents that "is it stuck?" feeling. If poppler isn't installed, you get a warning instead of garbage output.

## **The Final Function**

The complete function, available below or in a [gist](https://gist.github.com/johnfmorton/9961b6f97954067396a2c338269f3d76), ties everything together.

```plaintext
gemma() {
    setopt local_options no_monitor no_notify
    if ! curl -sf http://localhost:11434/ > /dev/null 2>&1; then
        echo "Starting Ollama service..."
        brew services start ollama
        while ! curl -sf http://localhost:11434/ > /dev/null 2>&1; do
            sleep 1
        done
    fi
    local has_glow=false
    command -v glow > /dev/null 2>&1 && has_glow=true

    local _gemma_system='Respond in plain Markdown only. Do not use LaTeX or math delimiters ($...$, \(...\), \[...\]). Write units and symbols as Unicode directly (e.g. 0°C, 32°F, π, ², ³, ½) instead of \circ, \text{}, \frac, etc.'

    # One-shot mode: pass prompt directly
    if [[ $# -gt 0 ]]; then
        local _prompt="$*"
        local _w _fp _fc
        for _w in ${(z)_prompt}; do
            if [[ "$_w" == @* ]]; then
                _fp="${_w#@}"
                if [[ -f "$_fp" ]]; then
                    if [[ "$_fp" == *.pdf ]]; then
                        if command -v pdftotext > /dev/null 2>&1; then
                            echo "Reading PDF: ${_fp##*/}... (this may take a moment)"
                            _fc=$(pdftotext "$_fp" -)
                        else
                            echo "Warning: install poppler for PDF support (brew install poppler)"
                            return 1
                        fi
                    else
                        _fc=$(cat "$_fp")
                    fi
                    _prompt="${_prompt//$_w/$'\n\n--- Contents of '"$_fp"$' ---\n'"$_fc"$'\n--- End of '"$_fp"$' ---\n'}"
                else
                    echo "File not found: $_fp"
                    echo "Use the full path, e.g. @\$HOME/Downloads/$_fp"
                    return 1
                fi
            fi
        done

        # Use the HTTP API directly so we get clean text (no TUI escape codes
        # that `ollama run` emits even when redirected). Run curl in the
        # background and show a spinner until it finishes.
        local _out_file _payload
        _out_file=$(mktemp)
        _payload=$(jq -n --arg s "$_gemma_system" --arg p "$_prompt" \
            '{"model": "gemma4:26b", "messages": [{"role":"system","content":$s},{"role":"user","content":$p}], "stream": false}')
        curl -sS http://localhost:11434/api/chat -d "$_payload" > "$_out_file" 2>&1 &
        local _curl_pid=$!

        local _spinner='|/-\'
        local _i=0
        printf '\033[?25l'
        while kill -0 "$_curl_pid" 2>/dev/null; do
            printf '\r\033[2;90mThinking... %s\033[0m' "${_spinner:_i++%${#_spinner}:1}"
            sleep 0.1
        done
        wait "$_curl_pid"
        local _status=$?
        printf '\r\033[K\033[?25h'

        if [[ $_status -ne 0 ]]; then
            cat "$_out_file"
            rm -f "$_out_file"
            return $_status
        fi

        local _content
        _content=$(jq -r '.message.content // empty' "$_out_file" 2>/dev/null)
        if [[ -z "$_content" ]]; then
            echo "Error: no response from model"
            cat "$_out_file"
            rm -f "$_out_file"
            return 1
        fi

        if $has_glow; then
            printf '%s\n' "$_content" | glow
        else
            printf '%s\n' "$_content"
        fi
        rm -f "$_out_file"
        return
    fi

    # Interactive mode with spinner + glow render
    local tmpfile=$(mktemp)
    local response_file=""
    local prompt expanded content _w _fp _fc

    _gemma_cleanup() {
        printf '\033[0m'
        rm -f "$tmpfile" "$tmpfile.tmp" "$response_file"
        trap - INT TERM
    }
    trap '_gemma_cleanup; return' INT TERM

    jq -n --arg s "$_gemma_system" '[{"role":"system","content":$s}]' > "$tmpfile"

    echo "Chat with Gemma (type 'exit' or Ctrl-C to quit)"
    while true; do
        printf "\n>>> "
        read -r prompt || break
        [[ -z "$prompt" || "$prompt" == "exit" ]] && break

        # Expand @filename references to file contents
        expanded="$prompt"
        local _file_error=false
        for _w in ${(z)prompt}; do
            if [[ "$_w" == @* ]]; then
                _fp="${_w#@}"
                if [[ -f "$_fp" ]]; then
                    if [[ "$_fp" == *.pdf ]]; then
                        if command -v pdftotext > /dev/null 2>&1; then
                            echo "Reading PDF: ${_fp##*/}... (this may take a moment)"
                            _fc=$(pdftotext "$_fp" -)
                        else
                            echo "Warning: install poppler for PDF support (brew install poppler)"
                            _file_error=true
                            break
                        fi
                    else
                        _fc=$(cat "$_fp")
                    fi
                    expanded="${expanded//$_w/$'\n\n--- Contents of '"$_fp"$' ---\n'"$_fc"$'\n--- End of '"$_fp"$' ---\n'}"
                else
                    echo "File not found: $_fp (use the full path, e.g. @/Users/john/Downloads/$_fp)"
                    _file_error=true
                    break
                fi
            fi
        done
        if $_file_error; then
            continue
        fi

        jq --arg p "$expanded" '. + [{"role": "user", "content": $p}]' "$tmpfile" > "$tmpfile.tmp" \
            && mv "$tmpfile.tmp" "$tmpfile"

        # Non-streaming request with spinner (matches one-shot mode)
        response_file=$(mktemp)
        local _payload=$(jq -n --argjson msgs "$(cat "$tmpfile")" \
            '{"model": "gemma4:26b", "messages": $msgs, "stream": false}')
        curl -sS http://localhost:11434/api/chat -d "$_payload" > "$response_file" 2>&1 &
        local _curl_pid=$!
        local _spinner='|/-\' _i=0
        printf '\033[?25l'
        while kill -0 "$_curl_pid" 2>/dev/null; do
            printf '\r\033[2;90mThinking... %s\033[0m' "${_spinner:_i++%${#_spinner}:1}"
            sleep 0.1
        done
        wait "$_curl_pid"
        printf '\r\033[K\033[?25h'

        content=$(jq -r '.message.content // empty' "$response_file" 2>/dev/null)

        if [[ -z "$content" ]]; then
            echo "Error: no response from model"
            rm -f "$response_file"
            continue
        fi

        jq --arg c "$content" '. + [{"role": "assistant", "content": $c}]' "$tmpfile" > "$tmpfile.tmp" \
            && mv "$tmpfile.tmp" "$tmpfile"

        if $has_glow; then
            printf '%s' "$content" | glow
        else
            printf '%s\n' "$content"
        fi

        rm -f "$response_file"
    done
    _gemma_cleanup
}
```

Drop this into your `.zshrc`, run `source ~/.zshrc`, and you've got:

- `<strong>gemma</strong>` — interactive chat that remembers everything you've said for the duration of the session, with an animated spinner and Glow rendering
- `<strong>gemma "your question"</strong>` — one-shot prompt with the same spinner and Glow rendering
- `<strong>@filename</strong>` — include file contents in prompts, in both one-shot and interactive mode
- **PDF support** — `@document.pdf` automatically extracts text via `pdftotext`
- Auto-start of Ollama if the service isn't running
- Clean Ctrl-C handling
- Graceful fallback if Glow, jq, or poppler aren't installed

## **What It Feels Like**

The experience went from "useful but rough" to something I find much more user-friendly. You type a question, see the spinner animate while Gemma is thinking, and then the Glow-rendered response appears all at once — tables with proper borders, highlighted code blocks, and headers with visual weight. It feels less like talking to a terminal and more like a proper chat interface.

All of this runs entirely on your machine, with no data leaving your laptop. Not bad for a shell function.

## Related Posts

- [Running Google’s Gemma 4 Locally on macOS with Ollama](https://supergeekery.com/blog/running-googles-gemma-4-locally-on-macos-with-ollama)
- [LLM Ready: The Craft CMS plugin I never imagined I&#039;d make](https://supergeekery.com/blog/llm-ready-the-craft-cms-plugin-i-never-imagined-id-make)
- [Craft CMS and chat-oriented programming, CHOP](https://supergeekery.com/blog/craft-cms-and-chat-oriented-programming-chop)
