Introducing Structured Outputs in the API
We are introducing Structured Outputs in the API—model outputs now reliably adhere to developer-supplied JSON Schemas.
Last year at DevDay, we introduced JSON mode—a useful building block for developers looking to build reliable applications with our models. While JSON mode improves model reliability for generating valid JSON outputs, it does not guarantee that the model’s response will conform to a particular schema. Today we’re introducing Structured Outputs in the API, a new feature designed to ensure model-generated outputs will exactly match JSON Schemas provided by developers.
Generating structured data from unstructured inputs is one of the core use cases for AI in today’s applications. Developers use the OpenAI API to build powerful assistants that have the ability to fetch data and answer questions via function calling(opens in a new window), extract structured data for data entry, and build multi-step agentic workflows that allow LLMs to take actions. Developers have long been working around the limitations of LLMs in this area via open source tooling, prompting, and retrying requests repeatedly to ensure that model outputs match the formats needed to interoperate with their systems. Structured Outputs solves this problem by constraining OpenAI models to match developer-supplied schemas and by training our models to better understand complicated schemas.
On our evals of complex JSON schema following, our new model gpt-4o-2024-08-06
with Structured Outputs scores a perfect 100%. In comparison, gpt-4-0613
scores less than 40%.
With Structured Outputs, gpt-4o-2024-08-06
achieves 100% reliability in our evals, perfectly matching the output schemas.
How to use Structured Outputs
We’re introducing Structured Outputs in two forms in the API:
1. Function calling: Structured Outputs via tools
is available by setting strict: true
within your function definition. This feature works with all models that support tools, including all models gpt-4-0613
and gpt-3.5-turbo-0613
and later. When Structured Outputs are enabled, model outputs will match the supplied tool definition.
1POST /v1/chat/completions2{3 "model": "gpt-4o-2024-08-06",4 "messages": [5 {6 "role": "system",7 "content": "You are a helpful assistant. The current date is August 6, 2024. You help users query for the data they are looking for by calling the query function."8 },9 {10 "role": "user",11 "content": "look up all my orders in may of last year that were fulfilled but not delivered on time"12 }13 ],14 "tools": [15 {16 "type": "function",17 "function": {18 "name": "query",19 "description": "Execute a query.",20 "strict": true,21 "parameters": {22 "type": "object",23 "properties": {24 "table_name": {25 "type": "string",26 "enum": ["orders"]27 },28 "columns": {29 "type": "array",30 "items": {31 "type": "string",32 "enum": [33 "id",34 "status",35 "expected_delivery_date",36 "delivered_at",37 "shipped_at",38 "ordered_at",39 "canceled_at"40 ]41 }42 },43 "conditions": {44 "type": "array",45 "items": {46 "type": "object",47 "properties": {48 "column": {49 "type": "string"50 },51 "operator": {52 "type": "string",53 "enum": ["=", ">", "<", ">=", "<=", "!="]54 },55 "value": {56 "anyOf": [57 {58 "type": "string"59 },60 {61 "type": "number"62 },63 {64 "type": "object",65 "properties": {66 "column_name": {67 "type": "string"68 }69 },70 "required": ["column_name"],71 "additionalProperties": false72 }73 ]74 }75 },76 "required": ["column", "operator", "value"],77 "additionalProperties": false78 }79 },80 "order_by": {81 "type": "string",82 "enum": ["asc", "desc"]83 }84 },85 "required": ["table_name", "columns", "conditions", "order_by"],86 "additionalProperties": false87 }88 }89 }90 ]91}
2. A new option for the response_format
parameter: developers can now supply a JSON Schema via json_schema
, a new option for the response_format
parameter. This is useful when the model is not calling a tool, but rather, responding to the user in a structured way. This feature works with our newest GPT-4o models: gpt-4o-2024-08-06
, released today, and gpt-4o-mini-2024-07-18
. When a response_format
is supplied with strict: true
, model outputs will match the supplied schema.
1POST /v1/chat/completions2{3 "model": "gpt-4o-2024-08-06",4 "messages": [5 {6 "role": "system",7 "content": "You are a helpful math tutor."8 },9 {10 "role": "user",11 "content": "solve 8x + 31 = 2"12 }13 ],14 "response_format": {15 "type": "json_schema",16 "json_schema": {17 "name": "math_response",18 "strict": true,19 "schema": {20 "type": "object",21 "properties": {22 "steps": {23 "type": "array",24 "items": {25 "type": "object",26 "properties": {27 "explanation": {28 "type": "string"29 },30 "output": {31 "type": "string"32 }33 },34 "required": ["explanation", "output"],35 "additionalProperties": false36 }37 },38 "final_answer": {39 "type": "string"40 }41 },42 "required": ["steps", "final_answer"],43 "additionalProperties": false44 }45 }46 }47}
Safe Structured Outputs
Safety is a top priority for OpenAI—the new Structured Outputs functionality will abide by our existing safety policies and will still allow the model to refuse an unsafe request. To make development simpler, there is a new refusal
string value on API responses which allows developers to programmatically detect if the model has generated a refusal instead of output matching the schema. When the response does not include a refusal and the model’s response has not been prematurely interrupted (as indicated by finish_reason
), then the model’s response will reliably produce valid JSON matching the supplied schema.
1{2 "id": "chatcmpl-9nYAG9LPNonX8DAyrkwYfemr3C8HC",3 "object": "chat.completion",4 "created": 1721596428,5 "model": "gpt-4o-2024-08-06",6 "choices": [7 {8 "index": 0,9 "message": {10 "role": "assistant",11 "refusal": "I'm sorry, I cannot assist with that request."12 },13 "logprobs": null,14 "finish_reason": "stop"15 }16 ],17 "usage": {18 "prompt_tokens": 81,19 "completion_tokens": 11,20 "total_tokens": 9221 },22 "system_fingerprint": "fp_3407719c7f"23}
Native SDK support
Our Python and Node SDKs have been updated with native support for Structured Outputs. Supplying a schema for tools or as a response format is as easy as supplying a Pydantic or Zod object, and our SDKs will handle converting the data type to a supported JSON schema, deserializing the JSON response into the typed data structure automatically, and parsing refusals if they arise.
The following examples show native support for Structured Outputs with function calling.
1from enum import Enum2from typing import Union3
4from pydantic import BaseModel5
6import openai7from openai import OpenAI8
9
10class Table(str, Enum):11 orders = "orders"12 customers = "customers"13 products = "products"14
15
16class Column(str, Enum):17 id = "id"18 status = "status"19 expected_delivery_date = "expected_delivery_date"20 delivered_at = "delivered_at"21 shipped_at = "shipped_at"22 ordered_at = "ordered_at"23 canceled_at = "canceled_at"24
25
26class Operator(str, Enum):27 eq = "="28 gt = ">"29 lt = "<"30 le = "<="31 ge = ">="32 ne = "!="33
34
35class OrderBy(str, Enum):36 asc = "asc"37 desc = "desc"38
39
40class DynamicValue(BaseModel):41 column_name: str42
43
44class Condition(BaseModel):45 column: str46 operator: Operator47 value: Union[str, int, DynamicValue]48
49
50class Query(BaseModel):51 table_name: Table52 columns: list[Column]53 conditions: list[Condition]54 order_by: OrderBy55
56
57client = OpenAI()58
59completion = client.beta.chat.completions.parse(60 model="gpt-4o-2024-08-06",61 messages=[62 {63 "role": "system",64 "content": "You are a helpful assistant. The current date is August 6, 2024. You help users query for the data they are looking for by calling the query function.",65 },66 {67 "role": "user",68 "content": "look up all my orders in may of last year that were fulfilled but not delivered on time",69 },70 ],71 tools=[72 openai.pydantic_function_tool(Query),73 ],74)75
76print(completion.choices[0].message.tool_calls[0].function.parsed_arguments)
Native Structured Outputs support is also available for response_format
.
1from pydantic import BaseModel2
3from openai import OpenAI4
5
6class Step(BaseModel):7 explanation: str8 output: str9
10
11class MathResponse(BaseModel):12 steps: list[Step]13 final_answer: str14
15
16client = OpenAI()17
18completion = client.beta.chat.completions.parse(19 model="gpt-4o-2024-08-06",20 messages=[21 {"role": "system", "content": "You are a helpful math tutor."},22 {"role": "user", "content": "solve 8x + 31 = 2"},23 ],24 response_format=MathResponse,25)26
27message = completion.choices[0].message28if message.parsed:29 print(message.parsed.steps)30 print(message.parsed.final_answer)31else:32 print(message.refusal)
Additional use cases
Developers frequently use OpenAI’s models to generate structured data for various use cases. Some additional examples include:
Dynamically generating user interfaces based on the user’s intent
For example, developers can use Structured Outputs to create code- or UI-generating applications. All of the following examples use the same response_format
, and can be used to generate varying UIs based on user input.
You are a user interface assistant. Your job is to help users visualize their website and app ideas.
Separating a final answer from supporting reasoning or additional commentary
It can be useful to give the model a separate field for chain of thought to improve the final quality of the response.
1{2 "model": "gpt-4o-2024-08-06",3 "messages": [4 {5 "role": "system",6 "content": "You are a helpful assistant"7 },8 {9 "role": "user",10 "content": "9.11 and 9.9 -- which is bigger?"11 }12 ],13 "response_format": {14 "type": "json_schema",15 "json_schema": {16 "name": "reasoning_schema",17 "strict": true,18 "schema": {19 "type": "object",20 "properties": {21 "reasoning_steps": {22 "type": "array",23 "items": {24 "type": "string"25 },26 "description": "The reasoning steps leading to the final conclusion."27 },28 "answer": {29 "type": "string",30 "description": "The final answer, taking into account the reasoning steps."31 }32 },33 "required": ["reasoning_steps", "answer"],34 "additionalProperties": false35 }36 }37 }38}
Extracting structured data from unstructured data
For example, instructing the model to extract things like to-dos, due dates, and assignments from meeting notes.
1POST /v1/chat/completions2{3 "model": "gpt-4o-2024-08-06",4 "messages": [5 {6 "role": "system",7 "content": "Extract action items, due dates, and owners from meeting notes."8 },9 {10 "role": "user",11 "content": "...meeting notes go here..."12 }13 ],14 "response_format": {15 "type": "json_schema",16 "json_schema": {17 "name": "action_items",18 "strict": true,19 "schema": {20 "type": "object",21 "properties": {22 "action_items": {23 "type": "array",24 "items": {25 "type": "object",26 "properties": {27 "description": {28 "type": "string",29 "description": "Description of the action item."30 },31 "due_date": {32 "type": ["string", "null"],33 "description": "Due date for the action item, can be null if not specified."34 },35 "owner": {36 "type": ["string", "null"],37 "description": "Owner responsible for the action item, can be null if not specified."38 }39 },40 "required": ["description", "due_date", "owner"],41 "additionalProperties": false42 },43 "description": "List of action items from the meeting."44 }45 },46 "required": ["action_items"],47 "additionalProperties": false48 }49 }50 }51}
Under the hood
We took a two part approach to improving reliability for model outputs that match JSON Schema. First, we trained our newest model gpt-4o-2024-08-06
to understand complicated schemas and how best to produce outputs that match them. However, model behavior is inherently non-deterministic—despite this model’s performance improvements (93% on our benchmark), it still did not meet the reliability that developers need to build robust applications. So we also took a deterministic, engineering-based approach to constrain the model’s outputs to achieve 100% reliability.
Constrained decoding
Our approach is based on a technique known as constrained sampling or constrained decoding. By default, when models are sampled to produce outputs, they are entirely unconstrained and can select any token from the vocabulary as the next output. This flexibility is what allows models to make mistakes; for example, they are generally free to sample a curly brace token at any time, even when that would not produce valid JSON. In order to force valid outputs, we constrain our models to only tokens that would be valid according to the supplied schema, rather than all available tokens.
It can be challenging to implement this constraining in practice, since the tokens that are valid differ throughout a model’s output. Let’s say we have the following schema:
1{2 "type": "object",3 "properties": {4 "value": { "type": "number" }5 },6 "required": ["value"],7 "additionalProperties": false8}
The tokens that are valid at the beginning of the output include things like {
, {“
, {\n
, etc. However, once the model has already sampled {“val
, then {
is no longer a valid token. Thus we need to implement dynamic constrained decoding, and determine which tokens are valid after each token is generated, rather than upfront at the beginning of the response.
To do this, we convert the supplied JSON Schema into a context-free grammar (CFG). A grammar is a set of rules that defines a language, and a context-free grammar is a grammar that conforms to specific rules. You can think of JSON and JSON Schema as particular languages with rules to define what is valid within the language. Just as it’s not valid in English to have a sentence with no verb, it is not valid in JSON to have a trailing comma.
Thus, for each JSON Schema, we compute a grammar that represents that schema, and pre-process its components to make it easily accessible during model sampling. This is why the first request with a new schema incurs a latency penalty—we must preprocess the schema to generate this artifact that we can use efficiently during sampling.
While sampling, after every token, our inference engine will determine which tokens are valid to be produced next based on the previously generated tokens and the rules within the grammar that indicate which tokens are valid next. We then use this list of tokens to mask the next sampling step, which effectively lowers the probability of invalid tokens to 0. Because we have preprocessed the schema, we can use a cached data structure to do this efficiently, with minimal latency overhead.
Alternate approaches
Alternate approaches to this problem often use finite state machines (FSMs) or regexes (generally implemented with FSMs) for constrained decoding. These function similarly in that they dynamically update which tokens are valid after each token is produced, but they have some key differences from the CFG approach. Notably, CFGs can express a broader class of languages than FSMs. In practice, this doesn’t matter for very simple schemas like the value
schema shown above. However, we find that the difference is meaningful for more complex schemas that involve nested or recursive data structures. As an example, FSMs cannot generally express recursive types, which means FSM based approaches may struggle to match parentheses in deeply nested JSON. The following is a sample recursive schema that is supported on the OpenAI API with Structured Outputs but would not be possible to express with a FSM.
1{2 "name": "ui",3 "description": "Dynamically generated UI",4 "strict": true,5 "schema": {6 "type": "object",7 "properties": {8 "type": {9 "type": "string",10 "description": "The type of the UI component",11 "enum": ["div", "button", "header", "section", "field", "form"]12 },13 "label": {14 "type": "string",15 "description": "The label of the UI component, used for buttons or form fields"16 },17 "children": {18 "type": "array",19 "description": "Nested UI components",20 "items": {21 "$ref": "#"22 }23 },24 "attributes": {25 "type": "array",26 "description": "Arbitrary attributes for the UI component, suitable for any element",27 "items": {28 "type": "object",29 "properties": {30 "name": {31 "type": "string",32 "description": "The name of the attribute, for example onClick or className"33 },34 "value": {35 "type": "string",36 "description": "The value of the attribute"37 }38 }39 }40 }41 },42 "required": ["type", "label", "children", "attributes"],43 "additionalProperties": false44 }45}
Note that each UI element can have arbitrary children which reference the root schema recursively. This flexibility is something that the CFG approach affords.
Limitations and restrictions
There are a few limitations to keep in mind when using Structured Outputs:
Structured Outputs allows only a subset of JSON Schema, detailed in our docs(opens in a new window). This helps us ensure the best possible performance.
The first API response with a new schema will incur additional latency, but subsequent responses will be fast with no latency penalty. This is because during the first request, we process the schema as indicated above and then cache these artifacts for fast reuse later on. Typical schemas take under 10 seconds to process on the first request, but more complex schemas may take up to a minute.
The model can fail to follow the schema if the model chooses to refuse an unsafe request. If it chooses to refuse, the return message will have the
refusal
boolean set to true to indicate this.The model can fail to follow the schema if the generation reaches
max_tokens
or another stop condition before finishing.Structured Outputs doesn’t prevent all kinds of model mistakes. For example, the model may still make mistakes within the values of the JSON object (e.g., getting a step wrong in a mathematical equation). If developers find mistakes, we recommend providing examples in the system instructions or splitting tasks into simpler subtasks.
Structured Outputs is not compatible with parallel function calls. When a parallel function call is generated, it may not match supplied schemas. Set
parallel_tool_calls: false
to disable parallel function calling.JSON Schemas supplied with Structured Outputs aren’t Zero Data Retention(opens in a new window) (ZDR) eligible.
Availability
Structured Outputs is generally available today in the API.
Structured Outputs with function calling is available on all models that support function calling in the API. This includes our newest models (gpt-4o
, gpt-4o-mini
), all models after and including gpt-4-0613
and gpt-3.5-turbo-0613
, and any fine-tuned models that support function calling. This functionality is available on the Chat Completions API, Assistants API, and Batch API. Structured Outputs with function calling is also compatible with vision inputs.
Structured Outputs with response formats is available on gpt-4o-mini
and gpt-4o-2024-08-06
and any fine tunes based on these models. This functionality is available on the Chat Completions API, Assistants API, and Batch API. Structured Outputs with response formats is also compatible with vision inputs.
By switching to the new gpt-4o-2024-08-06
, developers save 50% on inputs ($2.50/1M input tokens) and 33% on outputs ($10.00/1M output tokens) compared to gpt-4o-2024-05-13
.
To start using Structured Outputs, check out our docs(opens in a new window).
Acknowledgements
Structured Outputs takes inspiration from excellent work from the open source community: namely, the outlines(opens in a new window), jsonformer(opens in a new window), instructor(opens in a new window), guidance(opens in a new window), and lark(opens in a new window) libraries.