Reserved Capacity
Reserved instances are designed for cutting-edge customers running larger workloads, allowing inference at scale with full control over the model configuration and performance profile.
Today, reserved instances allow inference at scale:
Reserved instances are a static allocation of capacity dedicated to you, providing a predictable environment that you control.
You are able to monitor your specific instances with the same tools and dashboards OpenAI uses to build on our own models and optimize shared capacity models.
You can realize all the throughput, latency, and cost benefits from optimizing your specific workload (for example—caching and latency/throughput tradeoffs).
You choose when to update the snapshot of your model, deciding if and whether to use the latest models. If you want to change your model, just let our support team know.
Reserved instances offer SLAs for instance uptime and on-call engineering support:
99.5% uptime commitment
On-call engineering support for reserved instance customers
Reserved instance rentals are based on reserved compute units with 3-month or 1-year (~15% savings) commitments. Running an individual model instance (see below for current SKUs) requires a specific number of compute units: