Reserved Capacity

PublishedFebruary 8th, 2024

Reserved instances are designed for cutting-edge customers running larger workloads, allowing inference at scale with full control over the model configuration and performance profile.


Today, reserved instances allow inference at scale:

  • Reserved instances are a static allocation of capacity dedicated to you, providing a predictable environment that you control.
  • You are able to monitor your specific instances with the same tools and dashboards OpenAI uses to build on our own models and optimize shared capacity models.
  • You can realize all the throughput, latency, and cost benefits from optimizing your specific workload (for example—caching and latency/throughput tradeoffs).
  • You choose when to update the snapshot of your model, deciding if and whether to use the latest models. If you want to change your model, just let our support team know. 


Reserved instances offer SLAs for instance uptime and on-call engineering support:

  • 99.5% uptime commitment
  • On-call engineering support for reserved instance customers 


Reserved instance rentals are based on reserved compute units with 3-month or 1-year (~15% savings) commitments. Running an individual model instance (see below for current SKUs) requires a specific number of compute units:

Gpt Reserved Instance Price Tables

*Example: With a 1-year commitment, the 2 instance minimum for 300CU GPT-4 Turbo would cost $1,584,000. Each additional instance of 300CU costs $792,000/year 

FAQ

Can the input and output data be used by OpenAI for training?
No: we'll never use or train on your data. Reserved Capacity adheres to our Enterprise Privacy commitments.

What happens when the pay-as-you-go price of a model changes?
Reserved Capacity pricing is based on raw compute costs. Sometimes price changes are enabled by improved price efficiency, in which case updating to that model will yield more throughput on your reserved instances. Sometimes pay-as-you-go price changes are made for reasons separate from model efficiency in which case your reserved capacity throughput will be unaffected.

What throughput should I expect? How do I know how many units I should buy?
Throughput on a reserved instance - and the total units and instances you decide to use - will depend on your request pattern. Shared capacity API calls use a “blended” per-request “cost” across all traffic. With reserved instances, your performance is a function of the underlying performance of the hardware and your workload. We will provide a benchmarking instance and tooling for you to determine how many reserved units to purchase for your specific workloads and requirements.

Why does GPT-4 have a 2 instance minimum?
The 2 instance minimum for GPT-4 is the minimum compute budget needed in order to achieve a high level of reliability for this model. Additional instances of GPT-4 can be purchased one at a time.

If reserved instance prices change, will existing customers inherit the new prices?
If reserved instance prices increase, you’ll keep the negotiated price for the instances you already purchased for the lifetime of their contract (up to 12 months). If prices decrease, we will update your pricing to the lower price for the remainder of your term. If you purchased a 1-year commit paid annually, we will credit you the difference prorated for the remainder of your term.

Are compute units fungible between instances? If I want to switch from running two GPT-4 Turbo instances to six GPT-3.5 Turbo instances, can I? How?
Yes. You will still be charged for any compute units that are not provisioned to a model instance. Please reach out to your Account Director to request any changes.

When does billing start?
After the Order Form is executed, compute units will be provisioned by an agreed-upon date. Billing starts when compute units are first provisioned to an instance and on a daily basis, commencing at midnight UTC the day after the compute units are provisioned. For billing, a 3 month Term is defined as 90 days and a 12 month Term is defined as 365 days. Invoices will be sent monthly in arrears.

Do I have to buy compute units in increments of 100?
Yes, the minimum required compute unit is 100 today.

Is there a cap on number of compute units I can buy?
There is no explicit cap, but the number of compute units is subject to availability.

What if I need to increase the number of compute units commitment?
Reach out to us to increase the compute unit quota. Order fulfillment is subject to availability.

What if I want to reduce my commitment in the middle of the term?
If you’d need to reduce your commitment for the remainder of the term, please contact us.