Priority Processing for API Customers

Priority processing offers reliable, high-speed performance with the flexibility to pay-as-you-go.

By choosing Priority processing, you can unlock:

Predictably low latency: Priority processing generates tokens faster and at a more consistent speed than the Standard processing service, even during peak demand.
Easy-to-use flexibility: Like Standard processing, Priority processing can be accessed on a flexible, pay-as-you-go basis instead of requiring advance provisioning.

	Price per 1M input tokens	Price per 1M input tokens (cached)	Price per 1M output tokens	Uptime SLA³	Latency SLA³
GPT-5.6 Sol excludes long context¹	$10.00	$1.00	$60.00	99.9%	99% > 50 tokens per second²
GPT-5.6 Terra excludes long context¹	$5.00	$0.50	$30.00	99.9%	99% > 70 tokens per second²
GPT-5.6 Luna excludes long context¹	$2.00	$0.20	$12.00	99.9%	99% > 100 tokens per second²
GPT-5.5 excludes long context¹	$12.50	$1.250	$75.00	99.9%	99% > 50 tokens per second²
GPT-5.4 mini excludes long context¹	$1.50	$0.150	$9.00	99.9%	99% > 100 tokens per second²
GPT-5.4 excludes long context¹	$5.00	$0.500	$30.00	99.9%	99% > 50 tokens per second²
GPT-5.2	$3.50	$0.350	$28.00	99.9%	99% > 50 tokens per second²
GPT-5.1	$2.50	$0.250	$20.00	99.9%	99% > 50 tokens per second²
GPT-5	$2.50	$0.250	$20.00	99.9%	99% > 50 tokens per second²
GPT-5 mini	$0.45	$0.045	$3.60	99.9%	99% > 80 tokens per second²
GPT-5.1 codex	$2.50	$0.250	$20.00	99.9%	99% > 50 tokens per second²
GPT-5 codex	$2.50	$0.250	$20.00	99.9%	99% > 50 tokens per second²
GPT-4.1	$3.50	$0.875	$14.00	99.9%	99% > 80 tokens per second²
GPT-4.1 mini	$0.70	$0.175	$2.80	99.9%	99% > 90 tokens per second²
GPT-4.1 nano	$0.20	$0.050	$0.80	99.9%	99% > 100 tokens per second²
GPT-4o gpt-4o-2024-11-20 gpt-4o-2024-08-06	$4.25	$2.125	$17.00	99.9%	99% > 80 tokens per second²
gpt-4o-2024-05-13	$8.75	—	$26.25	99.9%	99% > 80 tokens per second²
GPT-4o mini	$0.25	$0.125	$1.00	99.9%	99% > 90 tokens per second²
o3	$3.50	$0.875	$14.00	99.9%	99% > 80 tokens per second²
o4-mini	$2.00	$0.500	$8.00	99.9%	99% > 90 tokens per second²

excludes long context¹

1Requests estimated at >272K prompt tokens

2Calculated as p50 request latency on a per 5 minute basis. For customers with existing enterprise agreements that have latency SLAs calculated as p50 request latency on a per minute basis, the prior SLAs are also still applicable.

3This is applicable to Enterprise customers only

Customers can direct traffic to Priority processing on a per request basis using the existing service_tier parameter, with the option service_tier = “priority”.

Tokens served by Priority processing will be billed on a per-token basis, priced at a premium relative to Standard processing rates.

In addition to being configured at the request level, you can also default a project to Priority in Project settings → Default Service Tier: Priority. You can still override per request.

Priority processing rate limits are shared with other service tiers.
In rare cases, rapid increases to your Priority processing Tokens per Minute can lead to hitting ramp rate limits. If you exceed the ramp rate limit, then additional traffic may be sent to Standard processing instead.

Scale Tier will remain separate from Priority processing.

Requests sent to Priority processing will be billed separately and will not count against your purchased Scale Tier TPM bundles.

No. Traffic sent to Scale Tier will not automatically spill over to Priority processing.

No. All processing modes count against your annual enterprise spend commitment.

Yes! For a given model, Cached Inputs receive the same 50%, 75%, or 90% discount as they do in Standard processing.

To view tokens processed by Priority processing, go to the Usage dashboard, select Chat Completions or Responses, and Group by Service Tier.

To view Priority processing cost, go to the Usage dashboard, and select Group by Line Item.

Not at this time. We will evaluate in the future whether to offer Priority processing on additional products beyond our latest models.

Priority processing supports the same multimodal capabilities available on Standard. In particular, images can be used as inputs to Priority processing and are processed with the same fast latency.

Yes. We plan to offer Priority processing on new GPT models. We don’t guarantee that every model will be supported.

Priority processing consumption is treated the same as standard API traffic for rate limits.

Priority processing has ramp rate limits to ensure consistently high performance for all customers, while still providing flexible, on-demand pricing. If (a) Priority processing performance is degraded AND (b) a customer’s traffic is ramping too quickly, then some Priority requests may be downgraded to Standard processing instead.

The current Priority processing ramp rate limit is defined as processing at least 1M TPM, and increasing traffic by >50% Tokens Per Minute in less than 15 minutes.

Requests processed by Standard service tier will be billed at standard rates, and are not eligible for Priority processing Service Level Objectives.

Requests processed by Standard service tier will include service_tier=”Default” in the response.

Best practices for staying within your ramp rate limit

Gradually increase traffic when changing models. For example, if your application is transitioning from a previous snapshot to a new one, use a feature flag to transition traffic over the course of a few hours rather than all at once.
Avoid running large data processing or asynchronous jobs on Priority processing. These jobs can ramp traffic very quickly, and often do not need the improved performance of Priority processing.
If you routinely encounter ramp rate limits, consider purchasing Scale Tier capacity instead or in addition.

Yes. All of your traffic contributes to the same ramp rate limit.

For Enterprise customers, please reach out to your AD with any questions or concerns.

Priority processing SLAs will be treated the same as Scale Tier SLAs; service credits will be offered should we fail to meet those SLAs for customers on enterprise agreements during a given time window.