Ollama.com Turbo API • Tom Krush

Ollama.com’s offers a new “Turbo” service. For $20 per month you can run GPT-OSS:120b, GPT-OSS:20b or Deepseek v3.1:571b. From what I understand they plan on launching more hosted models in the future. Learn more at https://ollama.com/turbo

The Turbo service support’s API keys, but doesn’t follow the typical OpenAI-compatible conventions. Below is information about their API.

Getting Started

All requests require authentication via Bearer token:

curl https://ollama.com/api/tags \
  -H "Authorization: Bearer your_api_key_here"

Get your API key from: https://ollama.com/settings/keys

Available Models

Currently, three models are available: - gpt-oss:120b - 120 billion parameter model - gpt-oss:20b - 20 billion parameter model - deepseek:v3.1:571b - 571 billion parameter model

Basic Chat Completion

Non-streaming Response

curl -X POST https://ollama.com/api/chat \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your_api_key_here" \
  -d '{
    "model": "gpt-oss:120b",
    "messages": [{"role": "user", "content": "Hello, how are you?"}],
    "stream": false,
    "options": {
      "temperature": 0.7,
      "num_predict": 1000,
      "top_p": 0.9
    }
  }'

Response format:

{
  "model": "gpt-oss:120b",
  "created_at": "2024-01-01T00:00:00.000Z",
  "message": {
    "role": "assistant",
    "content": "Hello! I'm doing well, thank you for asking. How can I help you today?"
  },
  "done": true,
  "total_duration": 1234567890,
  "prompt_eval_count": 5,
  "eval_count": 15
}

Streaming Responses

For real-time token generation, enable streaming:

curl -X POST https://ollama.com/api/chat \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your_api_key_here" \
  -d '{
    "model": "gpt-oss:120b",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

Streaming returns one JSON object per line:

{"model":"gpt-oss:120b","message":{"role":"assistant","content":"Once"},"done":false}
{"model":"gpt-oss:120b","message":{"role":"assistant","content":" upon"},"done":false}
{"model":"gpt-oss:120b","message":{"role":"assistant","content":" a"},"done":false}

The final chunk includes "done": true with usage statistics.

Multi-turn Conversations

Maintain conversation context by including previous messages:

{
  "model": "gpt-oss:120b",
  "messages": [
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
    {"role": "user", "content": "What is its population?"}
  ],
  "stream": false
}

Request Options

Fine-tune model behavior with these options:

{
  "options": {
    "temperature": 0.7,        // Creativity (0.0-2.0)
    "num_predict": 1000,       // Max tokens to generate
    "top_p": 0.9,             // Nucleus sampling (0.0-1.0)
    "frequency_penalty": 0.0,  // Reduce repetition (-2.0-2.0)
    "presence_penalty": 0.0,   // Topic diversity (-2.0-2.0)
    "stop": ["Human:", "AI:"]  // Stop sequences
  }
}

Error Handling

Common error responses:

401 Unauthorized

{"error": "unauthorized"}

404 Model Not Found

{"error": "model 'invalid-model' not found"}

429 Rate Limited

{"error": "rate limit exceeded"}