Skip to main content
OpenAI

Priority Processing for API Customers

This offering is available to Enterprise customers. Please contact our sales team⁠ to learn more.

Priority processing offers reliable, high-speed performance with the flexibility to pay-as-you-go.

By choosing Priority processing, you can unlock:

  • Predictably low latency: Priority processing generates tokens faster and at a more consistent speed than the Standard processing service, even during peak demand.
  • Easy-to-use flexibility: Like Standard processing, Priority processing can be accessed on a flexible, pay-as-you-go basis instead of requiring advance provisioning.
Price per 1M input tokensPrice per 1M input tokens (cached)Price per 1M output tokensUptime SLALatency SLA
GPT-4.1
excludes long context1
$3.50$0.875$14.0099.9%99% > 80 tokens per second2
GPT-4.1 mini
excludes long context1
$0.70$0.175$2.8099.9%99% > 90 tokens per second2
GPT-4.1 nano
excludes long context1
$0.20$0.050$0.8099.9%99% > 100 tokens per second2
GPT-4o
gpt-4o-2024-11-20
gpt-4o-2024-08-06
$4.25$2.125$17.0099.9%99% > 80 tokens per second2
gpt-4o-2024-05-13
$8.75$26.2599.9%99% > 80 tokens per second2
GPT-4o mini
$0.25$0.125$1.0099.9%99% > 90 tokens per second2
o3
$3.50$0.875$14.0099.9%99% > 80 tokens per second2
o4-mini
$2.00$0.500$8.0099.9%99% > 90 tokens per second2
1Requests estimated at >128K prompt tokens
2Calculated as p50 request latency on a per 5 minute basis. For customers with existing enterprise agreements that have latency SLAs calculated as p50 request latency on a per minute basis, the prior SLAs are also still applicable.

How it works

Customers can direct traffic to Priority processing on a per request basis using the existing service_tier parameter, with the option service_tier = “priority”.

Tokens served by Priority processing will be billed on a per-token basis, priced at a premium relative to Standard processing rates. 

In addition to being configured at the request level, we also plan to add the ability to opt-in at the project-level in the near term.

Limitations

  • Priority processing rate limits are shared with other service tiers. 
  • In rare cases, rapid increases to your Priority processing Tokens per Minute can lead to hitting ramp rate limits. If you exceed the ramp rate limit, then additional traffic may be sent to Standard processing instead.

Pricing

Scale Tier will remain separate from Priority processing.

Requests sent to Priority processing will be billed separately and will not count against your purchased Scale Tier TPM bundles.

Models

Not at this time. We will evaluate in the future whether to offer Priority processing on additional products beyond our latest models.

Rate limits

Priority processing consumption is treated the same as standard API traffic for rate limits.

Reliability

Please reach out to your AD with any questions or concerns. 

Priority processing SLAs will be treated the same as Scale Tier SLAs; service credits will be offered should we fail to meet those SLAs for customers on enterprise agreements during a given time window.

Policies

Yes