Skip to main content

Scale Tier for API Customers

Scale Tier lets you purchase a set number of API input and output tokens per minute (known as “token units”) upfront for access to one dedicated model snapshot. Each token unit is purchased for a minimum of 30 days.

By choosing Scale Tier, you can unlock:

  • Predictable latency: Scale Tier is designed to generate tokens at or above a rate of 25 tokens per second, even during peak demand.

  • Uncapped scale: Any quota purchases with Scale Tier is automatically added to your rate limits, so you can confidently scale further. 

  • Higher reliability: Scale Tier traffic offers a 99.9% uptime SLA and prioritized compute.

  • Cost savings: Scale Tier offers an additional base 12% savings compared to pay-as-you-go (PAYG), with further savings through annual spend commitments with OpenAI.

In the near future, we will provide visibility into throughput, latency, and uptime metrics on Scale Tier and we will make Scale Tier self-serve so you can adjust token capacity freely without new contracts.

How it works

With Scale Tier, you can purchase input and output token units. Each input unit costs $3,800 and entitles you to 20k input tokens/min. Each output unit costs $1,200 and entitles you to 2k output tokens/min. If you exceed your TPM unit quota in a given minute, excess requests are processed using shared GPT-4o capacity and charged at those PAYG rates.

API Scale Tier > Media > How it Works > Item > Asset > Light

How it compares

PAYGScale Tier
DescriptionPay-as-you-go for usage of OpenAI's shared enginesPurchase TPM units upfront with guaranteed latency, higher reliability, and automatic spillover to PAYG
Modalities SupportedText input & output
Vision input
Text input & output
Vision input
Max TPM/RPM LimitsListed hereAs neededPurchase token bundles as needed to support your unique traffic patterns
Commitment SLAN/A30 days minimum
Uptime SLA99.5%99.9%
Latency SLANone99% > 25 tokens per secondCalculated as average request latency on per minute basis across the month


Today, you can purchase Scale Tier token units by contacting your account director. Over the coming weeks, you will be able to add and remove token units through your developer console.

Token units and rate limits

We are working on a self-service option for Scale Tier and will share more information soon.


Today, Scale Tier is only available on GPT-4o. We expect to offer this service for future models when they are released.


You will be credited with the greater of the two SLA amounts for the calendar month of that Scale Tier token unit purchase.


If customers have a use case that qualifies for ZDR, then their Scale Tier usage will adhere to that same retention policy.