Your Ollama.
One gateway. Every user.

Self-hosted API gateway for Ollama with multi-user support, API key management, rate limiting, and real-time usage tracking.

Get Started View on GitHub
terminal
# Clone and run with Docker
$ git clone https://github.com/edoardoted99/llamapass.git
$ cd llamapass
$ cp .env.example .env
$ docker compose up --build

# Create your first admin user
$ docker compose exec web python manage.py createsuperuser

# Ready at http://localhost:8000

Everything you need

A complete gateway between your users and Ollama

๐Ÿ”‘

API Key Management

Create, revoke, and manage keys with expiration, per-key model restrictions, and rate limits.

๐Ÿ“Š

Usage Dashboard

30-day analytics with charts for requests, tokens, latency, errors, and model breakdown per key.

โšก

Rate Limiting

Configurable per-key limits backed by Redis. Live monitoring shows how close each key is to its limit.

๐Ÿ”„

Streaming Proxy

Transparent async proxy to Ollama. Full streaming support for chat and generate endpoints.

๐Ÿงช

Built-in Test Page

Test Chat, Generate, and Embeddings endpoints directly from the browser. No curl needed.

๐Ÿค

OpenAI Compatible

Works with OpenAI SDKs out of the box. Just point your base URL and use your LLamaPass API key.

Simple to integrate

Use any HTTP client or the OpenAI SDK

curl
curl https://llamapass.org/ollama/api/chat \
  -H "Authorization: Bearer oah_your_key" \
  -d '{
  "model": "gemma3:1b",
  "messages": [
    {"role": "user", "content": "Hello!"}
  ],
  "stream": false
}'
Python โ€” OpenAI SDK
from openai import OpenAI

client = OpenAI(
    base_url="https://llamapass.org/ollama/v1",
    api_key="oah_your_key",
)

response = client.chat.completions.create(
    model="gemma3:1b",
    messages=[
      {"role": "user", "content": "Hello!"}
    ],
)
print(response.choices[0].message.content)

Start using LLamaPass

Deploy in minutes. Self-hosted. Open source.

Create Account Sign In