Scale Tier for API Customers
Scale Tier lets you purchase a set number of API input and output tokens per minute (known as “token units”) upfront for access to one dedicated model snapshot. Each token unit is purchased for a minimum of 30 days.
By choosing Scale Tier, you can unlock:
Predictable latency: Scale Tier is designed to generate tokens faster and at a more consistent speed than the pay-as-you-go (PAYG) service, even during peak demand.
Uncapped scale: Any quota purchases with Scale Tier is automatically added to your rate limits, so you can confidently scale further.
Higher reliability: Scale Tier traffic offers a 99.9% uptime SLA and prioritized compute.
GPT-4o | GPT-4o mini | o1 | |
---|---|---|---|
Input bundle | 30,000 TPM | 500,000 TPM | 5,000 TPM |
Output bundle | 2,500 TPM | 50,000 TPM | 1,000 TPM |
Uptime SLA | 99.9% | 99.9% | 99.9% |
Latency SLA | 99% > 25 tokens per second | 99% > 33 tokens per second | 99% > 25 tokens per second |
How it works
With Scale Tier, you can purchase input and output token units. For example, with GPT-4o each input unit costs $3,800 and entitles you to 30k input tokens/min. Each output unit costs $1,200 and entitles you to 2.5k output tokens/min. If you exceed your TPM unit quota in a given minute, excess requests are processed using shared capacity and charged at those PAYG rates.
Cached input tokens are discounted 50%. More information about how Scale Tier interacts with Prompt Caching can be found in the FAQ section below.
![API Scale Tier > Media > How it Works > Item > Asset > Light](https://images.ctfassets.net/kftzwdyauwt9/7p8AQyoowEj32V0QE9phLp/3010c2ada33ac2dc752ea902d347350d/Scale_Tier_How_it_works_light.jpg?w=3840&q=90&fm=webp)
Pricing
Once you’ve signed an order form, you can add and remove token units through your developer console(opens in a new window).
![](https://images.ctfassets.net/kftzwdyauwt9/71kQuVFKz4eIGA8dexqy2S/30874d843eb63aad9950e68fbbd07ce4/CleanShot_2024-08-01_at_01.31.10_2x.png?w=3840&q=90&fm=webp)
Token units and rate limits
Once Scale Tier is enabled for your account, you can manually adjust your token units.
Models
Scale Tier is offered for GPT-4o, GPT-4o mini, and o1 (excluding o1-preview).
Reliability
You will be credited with the greater of the two SLA amounts for the calendar month of that Scale Tier token unit purchase.
Policies
If customers have a use case that qualifies for ZDR, then their Scale Tier usage will adhere to that same retention policy.