Ollama.com’s offers a new “Turbo” service. For $20 per month you can run GPT-OSS:120b, GPT-OSS:20b or Deepseek v3.1:571b. From what I understand they plan on launching more hosted models in the future. Learn more at https://ollama.com/turbo
The Turbo service support’s API keys, but doesn’t follow the typical OpenAI-compatible conventions. Below is information about their API.
Getting Started
All requests require authentication via Bearer token:
curl https://ollama.com/api/tags \
-H "Authorization: Bearer your_api_key_here"
Get your API key from: https://ollama.com/settings/keys
Available Models
Currently, three models are available:
- gpt-oss:120b
- 120 billion parameter model
- gpt-oss:20b
- 20 billion parameter model
- deepseek:v3.1:571b
- 571 billion parameter model
Basic Chat Completion
Non-streaming Response
curl -X POST https://ollama.com/api/chat \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key_here" \
-d '{
"model": "gpt-oss:120b",
"messages": [{"role": "user", "content": "Hello, how are you?"}],
"stream": false,
"options": {
"temperature": 0.7,
"num_predict": 1000,
"top_p": 0.9
}
}'
Response format:
{
"model": "gpt-oss:120b",
"created_at": "2024-01-01T00:00:00.000Z",
"message": {
"role": "assistant",
"content": "Hello! I'm doing well, thank you for asking. How can I help you today?"
},
"done": true,
"total_duration": 1234567890,
"prompt_eval_count": 5,
"eval_count": 15
}
Streaming Responses
For real-time token generation, enable streaming:
curl -X POST https://ollama.com/api/chat \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key_here" \
-d '{
"model": "gpt-oss:120b",
"messages": [{"role": "user", "content": "Tell me a story"}],
"stream": true
}'
Streaming returns one JSON object per line:
{"model":"gpt-oss:120b","message":{"role":"assistant","content":"Once"},"done":false}
{"model":"gpt-oss:120b","message":{"role":"assistant","content":" upon"},"done":false}
{"model":"gpt-oss:120b","message":{"role":"assistant","content":" a"},"done":false}
The final chunk includes "done": true
with usage statistics.
Multi-turn Conversations
Maintain conversation context by including previous messages:
{
"model": "gpt-oss:120b",
"messages": [
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
{"role": "user", "content": "What is its population?"}
],
"stream": false
}
Request Options
Fine-tune model behavior with these options:
{
"options": {
"temperature": 0.7, // Creativity (0.0-2.0)
"num_predict": 1000, // Max tokens to generate
"top_p": 0.9, // Nucleus sampling (0.0-1.0)
"frequency_penalty": 0.0, // Reduce repetition (-2.0-2.0)
"presence_penalty": 0.0, // Topic diversity (-2.0-2.0)
"stop": ["Human:", "AI:"] // Stop sequences
}
}
Error Handling
Common error responses:
401 Unauthorized
{"error": "unauthorized"}
404 Model Not Found
{"error": "model 'invalid-model' not found"}
429 Rate Limited
{"error": "rate limit exceeded"}