Skip to main content

Reserved Capacity

Reserved instances are designed for cutting-edge customers running larger workloads, allowing inference at scale with full control over the model configuration and performance profile.

Today, reserved instances allow inference at scale:

  • Reserved instances are a static allocation of capacity dedicated to you, providing a predictable environment that you control.

  • You are able to monitor your specific instances with the same tools and dashboards OpenAI uses to build on our own models and optimize shared capacity models.

  • You can realize all the throughput, latency, and cost benefits from optimizing your specific workload (for example—caching and latency/throughput tradeoffs).

  • You choose when to update the snapshot of your model, deciding if and whether to use the latest models. If you want to change your model, just let our support team know.

Reserved instances offer SLAs for instance uptime and on-call engineering support:

  • 99.5% uptime commitment

  • On-call engineering support for reserved instance customers

Reserved instance rentals are based on reserved compute units with 3-month or 1-year (~15% savings) commitments. Running an individual model instance (see below for current SKUs) requires a specific number of compute units:

Updated Reserved Capacity Information Table-Light

FAQ

No: we'll never use or train on your data. Reserved Capacity adheres to our Enterprise Privacy commitments.

**_What happens when the pay-as-you-go price of a model changes?_**Reserved Capacity pricing is based on raw compute costs. Sometimes price changes are enabled by improved price efficiency, in which case updating to that model will yield more throughput on your reserved instances. Sometimes pay-as-you-go price changes are made for reasons separate from model efficiency in which case your reserved capacity throughput will be unaffected.