Scale Tier for API Customers
This offering is available to Enterprise customers. Please contact our sales team to learn more.
Scale Tier lets you purchase a set number of API input and output tokens per minute (known as “token units”) upfront for access to one dedicated model snapshot. Each token unit is purchased for a minimum of 30 days. Additional models may be added based on customer interest. Additional models may be added based on customer interest.
By choosing Scale Tier, you can unlock:
- Predictable latency: Scale Tier is designed to generate tokens faster and at a more consistent speed than the pay-as-you-go (PAYG) service, even during peak demand.
- Uncapped scale: Any quota purchases with Scale Tier is automatically added to your rate limits, so you can confidently scale further.
- Higher reliability: Scale Tier traffic offers a 99.9% uptime SLA and prioritized compute.
GPT-4o | GPT-4o mini | GPT-4.1 (excludes long context requests estimated at >128k tokens) | GPT-4.1 mini (excludes long context requests estimated at >128k tokens) | |
---|---|---|---|---|
Input bundle | 30,000 TPM $124.59 per unit/day | 500,000 TPM $114.75 per unit/day | 30,000 TPM $110.00 per unit/day | 500,000 TPM $450.00 per unit/day |
Output bundle | 2,500 TPM $39.34 per unit/day | 50,000 TPM $49.18 per unit/day | 2,500 TPM $36.00 per unit/day | 50,000 TPM $175.00 per unit/day |
Uptime SLA | 99.9% | 99.9% | 99.9% | 99.9% |
Latency SLA | 99% > 25 tokens per second Calculated as average request latency on per minute basis across the month | 99% > 33 tokens per second Calculated as average request latency on per minute basis across the month | 99% > 40 tokens per second Calculated as average request latency on per minute basis across the month | 99% > 50 tokens per second Calculated as average request latency on per minute basis across the month |
o1 | o3 | o3-mini | o4-mini | |
---|---|---|---|---|
Input bundle | 5,000 TPM $163.93 per unit/day | 5,000 TPM $75.00 per unit/day | 30,000 TPM $78.69 per unit/day | 30,000 TPM $50.00 per unit/day |
Output bundle | 1,000 TPM $131.15 per unit/day | 1,000 TPM $60.00 per unit/day | 5,000 TPM $52.46 per unit/day | 5,000 TPM $32.50 per unit/day |
Uptime SLA | 99.9% | 99.9% | 99.9% | 99.9% |
Latency SLA | 99% > 25 tokens per second Calculated as average request latency on per minute basis across the month | 99% > 40 tokens per second Calculated as average request latency on per minute basis across the month | 99% > 66 tokens per second Calculated as average request latency on per minute basis across the month | 99% > 66 tokens per second Calculated as average request latency on per minute basis across the month |
How it works
With Scale Tier, you can purchase input and output token units. For example, with GPT‑4o each input unit costs $3,800 and entitles you to 30k input tokens/min. Each output unit costs $1,200 and entitles you to 2.5k output tokens/min. If you exceed your TPM unit quota in a given minute, excess requests are processed using shared capacity and charged at those PAYG rates.
More information about how Scale Tier interacts with Prompt Caching can be found in the FAQ section below.
Pricing
Once you’ve signed an order form, you can add and remove token units through your developer console(opens in a new window).

Token units and rate limits
Once Scale Tier is enabled for your account, you can manually adjust your token units.
Models
Scale Tier is offered for GPT‑4o, GPT 4.1, GPT‑4o mini, GPT‑4.1 mini, o3, o1, o3‑mini, and o4-mini.
Reliability
You will be credited with the greater of the two SLA amounts for the calendar month of that Scale Tier token unit purchase.
Policies
If customers have a use case that qualifies for ZDR, then their Scale Tier usage will adhere to that same retention policy.