Skip to main content

AI API

Endpoints for interacting with AI models through a unified provider layer. These endpoints are served by the backend API service, not the web application. They are available at the backend base URL, which may differ from the web API base URL depending on your deployment.
These endpoints are internal backend endpoints and are not exposed through the web application’s /api routes. The health check (GET /api/ai/health) and model listing endpoints (GET /api/ai/models, GET /api/ai/models/:provider) are public and do not require authentication — they return non-sensitive data for status monitoring and pricing discovery. All other endpoints require bearer token (API key) authentication. The chat and cost estimation endpoints additionally require a valid subscription plan. All POST requests must include the Content-Type: application/json header.
All /api/ai/* endpoints share a rate limit of 30 requests per minute per IP, not just the chat endpoint.

Health check

GET /api/ai/health
No authentication required. Returns the availability status of configured AI providers.

Response

{
  "status": "healthy",
  "providers": {
    "openrouter": true
  },
  "timestamp": "2026-03-19T00:00:00Z"
}
The status field is healthy when all providers are reachable and degraded when one or more are down.

Error response

When one or more providers fail, the response uses status: "error" and includes the error message:
{
  "status": "error",
  "error": "Provider connection failed"
}
CodeDescription
503AI service unavailable

List models

GET /api/ai/models
No authentication required. Returns all available AI models across providers. This endpoint is public to support pricing pages and model discovery.

Response

{
  "models": [
    {
      "id": "anthropic/claude-sonnet-4-20250514",
      "name": "Claude Sonnet",
      "provider": "openrouter",
      "description": "Fast, intelligent model for everyday tasks",
      "tags": ["chat", "code"],
      "inputCost": 0.003,
      "outputCost": 0.015,
      "contextWindow": 200000,
      "available": true
    }
  ],
  "count": 1,
  "openrouter": 1,
  "timestamp": "2026-03-19T00:00:00Z"
}

Errors

CodeDescription
500Failed to fetch models

List models by provider

GET /api/ai/models/:provider
No authentication required.

Path parameters

ParameterTypeDescription
providerstringProvider name (for example, openrouter)

Response

{
  "provider": "openrouter",
  "models": [],
  "count": 0,
  "timestamp": "2026-03-19T00:00:00Z"
}

Select model

POST /api/ai/models/select
Automatically selects the best model for a given task type.

Request body

FieldTypeRequiredDescription
taskTypestringNoType of task (default: general)

Response

{
  "model": {
    "id": "anthropic/claude-sonnet-4-20250514",
    "provider": "openrouter"
  },
  "taskType": "general",
  "timestamp": "2026-03-19T00:00:00Z"
}

Errors

CodeDescription
404No models available

Chat completion

POST /api/ai/chat
Send a chat completion request through the unified AI provider layer. The model is auto-selected if not specified.
This endpoint requires a valid subscription plan. Requests without a recognized plan or active Stripe subscription receive a 402 response. The requested model must also be available on your plan — see plan-based model access below.
The chat endpoint uses header-based authentication. Access control is enforced through the x-user-plan and x-stripe-subscription-id headers rather than JWT verification. Admin emails (configured via ADMIN_EMAILS) bypass both plan and subscription requirements.

Request headers

The following headers are required for plan enforcement:
HeaderTypeRequiredDescription
x-user-planstringYesSubscription plan name (label, solo, collective, or network)
x-user-emailstringNoUser email. Admin emails bypass plan restrictions.
x-stripe-subscription-idstringYesActive Stripe subscription ID

Request body

FieldTypeRequiredDescription
messagesarrayYesArray of message objects with role (user, assistant, or system) and content
modelstringNoModel ID. Auto-selected based on taskType if omitted. Must be allowed by your plan.
taskTypestringNoUsed for auto-selection when model is omitted
temperaturenumberNoSampling temperature
top_pnumberNoNucleus sampling parameter
max_tokensnumberNoMaximum tokens in the response

Example request

{
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Hello!" }
  ],
  "temperature": 0.7,
  "max_tokens": 1024
}

Response

Returns a structured response with the following shape:
{
  "id": "chatcmpl-abc123",
  "model": "anthropic/claude-sonnet-4-20250514",
  "provider": "openrouter",
  "message": {
    "role": "assistant",
    "content": "Hello! How can I help you today?"
  },
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 10,
    "total_tokens": 35
  },
  "timestamp": "2026-03-19T00:00:00Z"
}

Errors

CodeDescription
400Messages array is required and must be non-empty
402Valid subscription required. Returned when the plan header is missing or unrecognized (PLAN_REQUIRED), or when there is no active Stripe subscription (SUBSCRIPTION_REQUIRED).
403Model not available on your plan (MODEL_RESTRICTED). The response includes an allowedModels array listing the models your plan supports.
404No models available
429Monthly token quota exceeded (QUOTA_EXCEEDED). See token quotas below.
500AI provider error

402 error example

{
  "success": false,
  "error": "Valid subscription required. Choose a plan at /pricing",
  "code": "PLAN_REQUIRED"
}

403 error example

{
  "error": "Model openai/gpt-4-turbo not available on your plan. Upgrade for more models.",
  "code": "MODEL_RESTRICTED",
  "allowedModels": ["openai/gpt-4o-mini", "google/gemini-2.0-flash"]
}

429 error example

{
  "error": "Monthly token quota exceeded for plan \"solo\". Used 2,000,000 of 2,000,000 tokens. Quota resets at the start of next month.",
  "code": "QUOTA_EXCEEDED"
}

Token quotas

Each plan enforces a monthly token budget across all AI chat requests. Token usage (input + output) is tracked per user and checked before each request. When your usage reaches the plan limit, subsequent requests are rejected with a 429 status and a QUOTA_EXCEEDED error code. Quotas reset at the start of each calendar month.
PlanMonthly token limit
solo2,000,000
collective6,000,000
label20,000,000
networkUnlimited
The free plan has a token limit of zero. All agent usage requires a paid subscription.

Plan-based model access

Each subscription plan grants access to a specific set of AI models. The chat endpoint enforces these limits automatically.
PlanPriceModelsAgent limitSkill limitA2A messages/day
label£29/moopenai/gpt-4o-mini, google/gemini-2.0-flash, xiaomi/mimo-v2-pro13100
solo£79/moopenai/gpt-4o-mini, openai/gpt-4o, google/gemini-2.0-flash, anthropic/claude-3.5-sonnet, xiaomi/mimo-v2-pro310500
collective£199/moopenai/gpt-4o-mini, openai/gpt-4o, openai/gpt-4-turbo, google/gemini-2.0-flash, anthropic/claude-3.5-sonnet, anthropic/claude-3-opus, xiaomi/mimo-v2-pro10252,000
network£499/moAll models10010010,000
Admin users are automatically granted network-level access regardless of their subscription plan.

Model fallbacks

Each AI provider is configured with a primary model and a fallback model. When the primary model is unavailable or returns an error, the system automatically retries the request using the fallback model.
ProviderPrimary modelFallback model
openrouterxiaomi/mimo-v2-proopenrouter/anthropic/claude-sonnet-4, openrouter/google/gemini-2.5-flash
geminigoogle/gemini-2.0-flashopenrouter/anthropic/claude-sonnet-4-5
groqgroq/gemma2-9b-itopenai/gpt-4o-mini
anthropicanthropic/claude-sonnet-4-5openai/gpt-4o
openaiopenai/gpt-4oopenai/gpt-4o-mini
minimax (MiniMax/MiniMax-Text-01) is available in the provider configuration map but is not currently supported in the model fallback chain. It may be enabled in a future release.
Fallback routing is handled transparently. The response always indicates which model ultimately served the request via the model field.

Task-based model tiers

In addition to provider-level fallbacks, the backend AI service uses a tier-based routing system that selects models based on the type of work being performed. Each tier defines a primary model and two fallback models. If the primary model fails or times out, the system tries each fallback in order.
TierPrimary modelFallback models
reasoningdeepseek/deepseek-r1meta-llama/llama-3.3-70b-instruct, moonshotai/kimi-k2.5
codingqwen/qwen-2.5-coder-32b-instructdeepseek/deepseek-r1, google/gemini-2.0-flash-001
fastmeta-llama/llama-3.3-70b-instructmistralai/mistral-7b-instruct, google/gemini-2.0-flash-001
creativemoonshotai/kimi-k2.5deepseek/deepseek-r1, meta-llama/llama-3.3-70b-instruct
Each model attempt has a timeout of 30 seconds by default. You can override this with the AI_MODEL_TIMEOUT_MS environment variable. All tier-based requests are routed through OpenRouter.

Estimate cost

POST /api/ai/estimate-cost
Estimate the cost of a request based on token counts and model pricing.

Request body

FieldTypeRequiredDescription
modelstringYesModel ID
inputTokensnumberYesNumber of input tokens
outputTokensnumberYesNumber of output tokens

Response

{
  "model": "anthropic/claude-sonnet-4-20250514",
  "inputTokens": 1000,
  "outputTokens": 500,
  "estimatedCost": 0.0045,
  "currency": "USD",
  "timestamp": "2026-03-19T00:00:00Z"
}

Errors

CodeDescription
400Model, inputTokens, and outputTokens are all required