What you’ll build: Open WebUI running in Docker on your Mac, accessible from any phone, tablet, or laptop on your home network.
End state: A browser-based chat interface your whole family can use. Pick a model, start a conversation, no accounts or internet required. A private ChatGPT alternative running entirely on your hardware.
What you’ll understand: How to pick the right model for your hardware, what local AI is actually good at in 2026, and how to set up Open WebUI so your family can use it from any device.
Prerequisites: Ollama installed and working on your Mac. If you haven’t done that yet, start with What is Ollama and How Do You Run It on a Mac?
Picking the right model
Not all models are created equal, and what works depends on your hardware. Here’s what I’ve tested on a Mac Studio M1 Max with 64GB unified memory.
| Model | Size on disk | RAM needed | Good for | Speed (M1 Max 64GB) |
|---|---|---|---|---|
| Llama 3.2 3B | ~2 GB | ~4 GB | Quick tasks, testing, low overhead | Very fast, near-instant |
| Llama 3.1 8B | ~4.7 GB | ~8 GB | General purpose, daily driver | Fast, comfortable for chat |
| Qwen 2.5 32B | ~20 GB | ~24 GB | Best quality at reasonable speed | Medium, ~15 tok/s |
| Qwen 2.5 Coder 7B | ~4.7 GB | ~8 GB | Code generation, review | Fast |
| Qwen 2.5 Coder 32B | ~20 GB | ~24 GB | Complex coding tasks | Medium, ~15 tok/s |
| Mistral 7B | ~4.1 GB | ~8 GB | Compact, good European languages | Fast |
| DeepSeek Coder V2 | ~8.9 GB | ~12 GB | Code specialist, fill-in-the-middle | Moderate |
To pull any of these:
ollama pull llama3.2:3b
ollama pull qwen2.5:32b
ollama pull qwen2.5-coder:7b
ollama pull qwen2.5-coder:32b
ollama pull mistral:7b
ollama pull deepseek-coder-v2:latest
A few notes on picking models:
Start with Llama 3.1 8B. It’s the safest default. Fast, capable enough for most things, leaves plenty of RAM for your other services.
Qwen 2.5 32B is the sweet spot for 64GB machines. Noticeably better answers than the 8B models. Uses a big chunk of memory but leaves enough room for Docker services running alongside it. This is what I reach for when quality matters.
3B models are useful, not just toys. Llama 3.2 3B handles summarization, simple Q&A, and text reformatting well enough. If you’re building automations that make many small requests, the speed advantage matters more than the quality gap.
Ollama loads models into memory on first request and unloads them after 5 minutes of inactivity (configurable). You don’t need to worry about memory management. Pull several models and switch between them as needed.
For more on model names, quantization, and browsing the full model ecosystem, see What is Ollama → What else can you run?
What local models are good at (and not)
Be realistic about what these can do. Local models in the 7-32B parameter range are not ChatGPT or Claude replacements. What works depends heavily on which model and how many parameters you throw at it.
Works well, even with smaller models (3-8B):
- Summaries and short abstracts
- Tagging, classification, categorization
- Extracting names, dates, amounts from text
- Reformatting and restructuring data
- Simple drafts (short emails, messages)
Works with larger models (32B+):
- Longer-form drafting
- Code explanation, translation between languages
- Brainstorming
- Q&A on well-known topics (but verify the answers)
Not there yet:
- Complex multi-step reasoning
- Factual accuracy on specific topics (they hallucinate, often confidently)
- Long conversations with many turns (context degrades)
- Following intricate instructions reliably
- Math beyond basics
The quality gap compared to cloud models is real. What makes up for it is privacy, which we’ll get to once Open WebUI is running.
Install Open WebUI
Open WebUI gives you a browser-based chat interface that talks to Ollama. Think of it as a self-hosted ChatGPT alternative you can run on your Mac.
Create a directory for the stack and a docker-compose.yml inside it:
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: unless-stopped
ports:
- "3002:8080"
extra_hosts:
- "host.docker.internal:host-gateway"
volumes:
- open-webui-data:/app/backend/data
environment:
- OLLAMA_BASE_URL=http://host.docker.internal:11434
- WEBUI_AUTH=false
volumes:
open-webui-data:
A quick breakdown of what each piece does:
ports: "3002:8080"maps port 3002 on your Mac to port 8080 inside the container. You’ll access the UI athttp://localhost:3002.extra_hostsadds a DNS entry inside the container so it can reach your Mac’s network. Ollama runs on the host, not in Docker, so the container needs a way to find it.OLLAMA_BASE_URLtells Open WebUI where Ollama’s API lives. Since Ollama runs on the host at port 11434, we point it athost.docker.internal:11434.WEBUI_AUTH=falsedisables the login screen. Anyone on your network can use the interface without creating an account.open-webui-datais a Docker volume that persists chat history, settings, and user accounts between container restarts.restart: unless-stoppedbrings the container back after a reboot or crash, unless you explicitly stopped it.
Start it:
docker compose up -d
You should see Docker pull the image (first time only, about 2GB) and start the container:
[+] Running 1/1
✔ Container open-webui Started
Open http://localhost:3002 in your browser. On first visit, create an admin account (this is local, the account only exists on your machine). Pick a model from the dropdown at the top, and start chatting. If the dropdown is empty, Open WebUI can’t reach Ollama. Check that Ollama is running (curl http://localhost:11434/api/version) and that the OLLAMA_BASE_URL in your compose file is correct.
Once Open WebUI is running, you can also pull models directly from its built-in model browser without touching the terminal. Convenient when you want to try something mid-conversation.
Making it available on your network
Because we mapped port 3002, Open WebUI is accessible from any device on your local network at http://<your-mac-ip>:3002. Your family can bookmark it on their phones and laptops. It looks and feels like ChatGPT, so there’s no learning curve.
With WEBUI_AUTH=false, there’s no login screen. Anyone on your network can use it. For a home network behind a router, that’s fine. If you want per-user chat history or access control, remove that line and let each family member create an account on first visit.
Things to try
Once Open WebUI is running, paste in a school newsletter and ask for a summary. Paste a recipe and ask it to scale from 2 to 6 portions. Ask it to draft a reply to your landlord about the utility bill.
The interesting part is what happens when the whole family starts using it. My wife pastes in letters from the school or the insurance company and asks what they actually mean. I’ve checked employment contracts for notice periods. Asked about my kid’s rash at 10 PM. The kind of questions you wouldn’t type into Google or ChatGPT because they’re too personal, too specific to your family. With a local model, there’s nobody on the other end. No account, no history, no profile being built. It’s just your Mac.
One more thing worth knowing: the model you downloaded today will behave the same way in six months. No silent updates that change how it responds. If you’ve used ChatGPT long enough to notice a model getting worse after an update, you know how annoying that is. Local models are frozen. You update when you choose to.
What about LM Studio?
LM Studio is a popular alternative. It has a GUI, a CLI, can run as a headless daemon, and supports Apple’s MLX framework which runs 20-30% faster than Ollama’s GGUF backend on Apple Silicon. Worth looking at, especially if inference speed matters to you. We’ll cover LM Studio in a separate guide and compare the two in detail.
This guide focuses on Ollama because it has the broader ecosystem today. Open WebUI, n8n, Continue (VS Code), LangChain, and most other tools that integrate with local LLMs expect an Ollama endpoint.
Checklist
- At least one model pulled (
ollama pull llama3.1:8b) - Open WebUI container running (
docker compose up -d) - Open WebUI accessible at
http://localhost:3002 - Open WebUI sees your Ollama models in the dropdown
- Tested from another device on your network
Frequently asked questions
Can I run ChatGPT locally without an account? Not ChatGPT itself, that’s OpenAI’s service and it stays on their servers. But Open WebUI with Ollama gives you the same chat interface running on your Mac. No account, no subscription, no data leaving your network. Your family won’t know the difference until they ask it something really hard.
How much RAM does Open WebUI need? The Open WebUI container uses about 500MB. The model is what eats memory. An 8B model needs ~8GB, a 32B model ~24GB. On a 64GB Mac you can run both alongside Immich and a dozen other containers without thinking about it.
Can my family use this from their phones?
Open WebUI is a web app. Bookmark http://<your-mac-ip>:3002 on their phone, done. Looks and feels like ChatGPT. My wife started using it the same day I set it up, no tutorial needed.
Is this a good ChatGPT alternative for private questions? For the kind of questions you wouldn’t type into Google because they’re too personal, too specific, too embarrassing? A 32B local model handles those fine. Rashes at 10 PM, insurance letters you don’t understand, employment contract fine print. Cloud models are still ahead for complex research. But nothing here is logged, stored, or profiled.
Next steps: Want to manage your family’s photos too? Set up Immich on your Mac →
From the Build Log: I Bought a Used Mac Studio to Run Local LLMs and Local LLMs on a Mac: From Magic to Disappointment.