Run Local LLMs with Ollama + Open WebUI in Docker
Want to run large language models like LLaMA 3 on your own machine — no OpenAI key, no internet required?
Here’s a quick guide to get started with Ollama and Open WebUI using Docker Compose.
Prerequisites
Make sure you have the following installed:
🐳 Docker and docker compose (v2.20+ recommended) or Orbstack
🧠 A machine with at least:
8–16 GB RAM for basic models like llama3
Optional GPU support for faster inference (Ollama supports Apple Silicon, NVIDIA, and AMD)
🔧 (Optional) jq — for formatting CLI output
💡 Models are stored locally and can consume several GB of disk space.
📦 1. Create the Project
mkdir ollama-chat && cd ollama-chat
Create a file called docker-compose.yml
:
version: '3.9'
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
tty: true
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama:/root/.ollama
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
depends_on:
- ollama
ports:
- "3000:8080"
environment:
- OLLAMA_BASE_URL=http://ollama:11434
volumes:
- open-webui:/app/backend/data
extra_hosts:
- host.docker.internal:host-gateway
restart: unless-stopped
volumes:
ollama: {}
open-webui: {}
▶️ 2. Start It Up
Run everything in the background:
docker compose up -d
This launches:
- 🧠
ollama
: The local model runtime (LLMs are downloaded here) - 🖥️
open-webui
: A browser-based chat interface
📥 3. Pull a Model
Download your first model (e.g. llama3
):
docker exec -it ollama ollama pull llama3
This may take a few minutes depending on the model size and your network.
🧪 4. Test via Curl
Once the model is ready, try a chat call from your terminal:
curl -s -N -X POST http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "llama3",
"messages": [{"role": "user", "content": "Say this is a test!"}]
}' | jq -r 'select(.message.content != null) | .message.content' | tr -d '\n'; echo
✔️ Example output:
This is a test!
💬 5. Chat in the Browser
Now open http://localhost:3000 — you’ll see Open WebUI loaded and ready to chat with llama3
(or any other model you pull).
✅ That’s It!
You’ve got:
- Local inference with zero external dependencies
- Full control over models and memory
- Chat-friendly UI for everyday usage