Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.gpuhub.com/llms.txt

Use this file to discover all available pages before exploring further.

You can call supported models through a compatible API, monitor usage, and pay based on actual consumption.
Claude CodeChatGPTGeminiMiniMaxCodexDeepSeekQwenMiniMax
Model availability, context length, and final prices may vary by model. Use the prices displayed in the console as the source of truth before starting production traffic.

Supported Model Types

AI Gateway supports multiple categories of models for common AI workloads.
Model TypeTypical Use CasesBilling Unit
Language modelsChat, reasoning, coding, summarization, tool calling, and structured outputInput and output tokens
Long-context language modelsDocument analysis, large codebase analysis, and retrieval-augmented generationInput and output tokens, sometimes tiered by context size
Claude-series modelsChat, coding, reasoning, and workflows that use prompt cachingInput tokens, output tokens, cache writes, and cache reads

Pricing

For each model, use the unit price displayed in the console as the source of truth.

Language Models

General language models are billed by token usage.
Usage TypeUnit Price
Input tokens$XX.XX / 1M tokens
Output tokens$XX.XX / 1M tokens
Billing formula:
Total cost =
  input tokens / 1,000,000 * input token price
+ output tokens / 1,000,000 * output token price

Tiered Context Pricing

Some language models use tiered pricing based on the number of input tokens in a request. For example, requests within a shorter context window may use one price, while requests with a larger context window may use a higher price.
Context TierInput Token PriceOutput Token Price
Standard context$XX.XX / 1M tokens$XX.XX / 1M tokens
Extended context$XX.XX / 1M tokens$XX.XX / 1M tokens
Billing formula:
Total cost =
  input tokens / 1,000,000 * tiered input token price
+ output tokens / 1,000,000 * tiered output token price

Claude-Series Prompt Caching

Claude-series models may include prompt caching charges in addition to normal input and output token charges.
Usage TypeUnit Price
Input tokens$XX.XX / 1M tokens
Output tokens$XX.XX / 1M tokens
Cache write, 5-minute TTL$XX.XX / 1M tokens
Cache write, 1-hour TTL$XX.XX / 1M tokens
Cache read$XX.XX / 1M tokens
Billing formula:
Total cost =
  input tokens / 1,000,000 * input token price
+ output tokens / 1,000,000 * output token price
+ 5-minute cache write tokens / 1,000,000 * 5-minute cache write price
+ 1-hour cache write tokens / 1,000,000 * 1-hour cache write price
+ cache read tokens / 1,000,000 * cache read price

Usage Notes

  • Check the console for the latest model list, supported parameters, context limits, and real-time prices.
  • For language models, estimate cost from both input and output tokens. Long prompts, retrieved documents, and tool results can increase input token usage.
  • For cached prompts, cache writes and cache reads may appear as separate billable usage types.
  • Use rate limits and budget controls when integrating AI Gateway into production applications.