Open WebUI + Ollama on Mac: Local AI for Your Home Network // famstack.dev

What you’ll build: Open WebUI running in Docker on your Mac, accessible from any phone, tablet, or laptop on your home network.

End state: A browser-based chat interface your whole family can use. Pick a model, start a conversation, no accounts or internet required. A private ChatGPT alternative running entirely on your hardware.

What you’ll understand: How to pick the right model for your hardware, what local AI is actually good at in 2026, and how to set up Open WebUI so your family can use it from any device.

Prerequisites: Ollama installed and working on your Mac. If you haven’t done that yet, start with What is Ollama and How Do You Run It on a Mac?

Picking the right model

Not all models are created equal, and what works depends on your hardware. Here’s what I’ve tested on a Mac Studio M1 Max with 64GB unified memory.

Model	Size on disk	RAM needed	Good for	Speed (M1 Max 64GB)
Llama 3.2 3B	~2 GB	~4 GB	Quick tasks, testing, low overhead	Very fast, near-instant
Llama 3.1 8B	~4.7 GB	~8 GB	General purpose, daily driver	Fast, comfortable for chat
Qwen 2.5 32B	~20 GB	~24 GB	Best quality at reasonable speed	Medium, ~15 tok/s
Qwen 2.5 Coder 7B	~4.7 GB	~8 GB	Code generation, review	Fast
Qwen 2.5 Coder 32B	~20 GB	~24 GB	Complex coding tasks	Medium, ~15 tok/s
Mistral 7B	~4.1 GB	~8 GB	Compact, good European languages	Fast
DeepSeek Coder V2	~8.9 GB	~12 GB	Code specialist, fill-in-the-middle	Moderate

To pull any of these:

ollama pull llama3.2:3b
ollama pull qwen2.5:32b
ollama pull qwen2.5-coder:7b
ollama pull qwen2.5-coder:32b
ollama pull mistral:7b
ollama pull deepseek-coder-v2:latest

A few notes on picking models:

Start with Llama 3.1 8B. It’s the safest default. Fast, capable enough for most things, leaves plenty of RAM for your other services.

Qwen 2.5 32B is the sweet spot for 64GB machines. Noticeably better answers than the 8B models. Uses a big chunk of memory but leaves enough room for Docker services running alongside it. This is what I reach for when quality matters.

3B models are useful, not just toys. Llama 3.2 3B handles summarization, simple Q&A, and text reformatting well enough. If you’re building automations that make many small requests, the speed advantage matters more than the quality gap.

Ollama loads models into memory on first request and unloads them after 5 minutes of inactivity (configurable). You don’t need to worry about memory management. Pull several models and switch between them as needed.

For more on model names, quantization, and browsing the full model ecosystem, see What is Ollama → What else can you run?

What local models are good at (and not)

Be realistic about what these can do. Local models in the 7-32B parameter range are not ChatGPT or Claude replacements. What works depends heavily on which model and how many parameters you throw at it.

Works well, even with smaller models (3-8B):

Summaries and short abstracts
Tagging, classification, categorization
Extracting names, dates, amounts from text
Reformatting and restructuring data
Simple drafts (short emails, messages)

Works with larger models (32B+):

Longer-form drafting
Code explanation, translation between languages
Brainstorming
Q&A on well-known topics (but verify the answers)

Not there yet:

Complex multi-step reasoning
Factual accuracy on specific topics (they hallucinate, often confidently)
Long conversations with many turns (context degrades)
Following intricate instructions reliably
Math beyond basics

The quality gap compared to cloud models is real. What makes up for it is privacy, which we’ll get to once Open WebUI is running.

Install Open WebUI

Open WebUI gives you a browser-based chat interface that talks to Ollama. Think of it as a self-hosted ChatGPT alternative you can run on your Mac.

Create a directory for the stack and a docker-compose.yml inside it:

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "3002:8080"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    volumes:
      - open-webui-data:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
      - WEBUI_AUTH=false

volumes:
  open-webui-data:

A quick breakdown of what each piece does:

ports: "3002:8080" maps port 3002 on your Mac to port 8080 inside the container. You’ll access the UI at http://localhost:3002.
extra_hosts adds a DNS entry inside the container so it can reach your Mac’s network. Ollama runs on the host, not in Docker, so the container needs a way to find it.
OLLAMA_BASE_URL tells Open WebUI where Ollama’s API lives. Since Ollama runs on the host at port 11434, we point it at host.docker.internal:11434.
WEBUI_AUTH=false disables the login screen. Anyone on your network can use the interface without creating an account.
open-webui-data is a Docker volume that persists chat history, settings, and user accounts between container restarts.
restart: unless-stopped brings the container back after a reboot or crash, unless you explicitly stopped it.

Start it:

docker compose up -d

You should see Docker pull the image (first time only, about 2GB) and start the container:

[+] Running 1/1
 ✔ Container open-webui  Started

Open http://localhost:3002 in your browser. On first visit, create an admin account (this is local, the account only exists on your machine). Pick a model from the dropdown at the top, and start chatting. If the dropdown is empty, Open WebUI can’t reach Ollama. Check that Ollama is running (curl http://localhost:11434/api/version) and that the OLLAMA_BASE_URL in your compose file is correct.

Once Open WebUI is running, you can also pull models directly from its built-in model browser without touching the terminal. Convenient when you want to try something mid-conversation.

Making it available on your network

Because we mapped port 3002, Open WebUI is accessible from any device on your local network at http://<your-mac-ip>:3002. Your family can bookmark it on their phones and laptops. It looks and feels like ChatGPT, so there’s no learning curve.

With WEBUI_AUTH=false, there’s no login screen. Anyone on your network can use it. For a home network behind a router, that’s fine. If you want per-user chat history or access control, remove that line and let each family member create an account on first visit.

Things to try

Once Open WebUI is running, paste in a school newsletter and ask for a summary. Paste a recipe and ask it to scale from 2 to 6 portions. Ask it to draft a reply to your landlord about the utility bill.

The interesting part is what happens when the whole family starts using it. My wife pastes in letters from the school or the insurance company and asks what they actually mean. I’ve checked employment contracts for notice periods. Asked about my kid’s rash at 10 PM. The kind of questions you wouldn’t type into Google or ChatGPT because they’re too personal, too specific to your family. With a local model, there’s nobody on the other end. No account, no history, no profile being built. It’s just your Mac.

One more thing worth knowing: the model you downloaded today will behave the same way in six months. No silent updates that change how it responds. If you’ve used ChatGPT long enough to notice a model getting worse after an update, you know how annoying that is. Local models are frozen. You update when you choose to.

What about LM Studio?

LM Studio is a popular alternative. It has a GUI, a CLI, can run as a headless daemon, and supports Apple’s MLX framework which runs 20-30% faster than Ollama’s GGUF backend on Apple Silicon. Worth looking at, especially if inference speed matters to you. We’ll cover LM Studio in a separate guide and compare the two in detail.

This guide focuses on Ollama because it has the broader ecosystem today. Open WebUI, n8n, Continue (VS Code), LangChain, and most other tools that integrate with local LLMs expect an Ollama endpoint.

Checklist

At least one model pulled (ollama pull llama3.1:8b)
Open WebUI container running (docker compose up -d)
Open WebUI accessible at http://localhost:3002
Open WebUI sees your Ollama models in the dropdown
Tested from another device on your network

Frequently asked questions

Can I run ChatGPT locally without an account? Not ChatGPT itself, that’s OpenAI’s service and it stays on their servers. But Open WebUI with Ollama gives you the same chat interface running on your Mac. No account, no subscription, no data leaving your network. Your family won’t know the difference until they ask it something really hard.

How much RAM does Open WebUI need? The Open WebUI container uses about 500MB. The model is what eats memory. An 8B model needs ~8GB, a 32B model ~24GB. On a 64GB Mac you can run both alongside Immich and a dozen other containers without thinking about it.

Can my family use this from their phones? Open WebUI is a web app. Bookmark http://<your-mac-ip>:3002 on their phone, done. Looks and feels like ChatGPT. My wife started using it the same day I set it up, no tutorial needed.

Is this a good ChatGPT alternative for private questions? For the kind of questions you wouldn’t type into Google because they’re too personal, too specific, too embarrassing? A 32B local model handles those fine. Rashes at 10 PM, insurance letters you don’t understand, employment contract fine print. Cloud models are still ahead for complex research. But nothing here is logged, stored, or profiled.

Next steps: Want to manage your family’s photos too? Set up Immich on your Mac →

From the Build Log: I Bought a Used Mac Studio to Run Local LLMs and Local LLMs on a Mac: From Magic to Disappointment.