2026年6月18日

提升 ChatGPT 的健康智能

GPT‑5.5 Instant 将前沿健康智能带给更多人，这得益于我们的模型进步和由医生主导的评估。

正在加载…

健康是人们使用 ChatGPT 的最有意义的方式之一。每周，超过 2.3 亿人会向 ChatGPT 寻求健康与身心健康问题方面的帮助：理解健康信息、看懂化验结果、为就诊做准备、处理保险事宜、养成更健康的习惯，以及弄清下一步该问什么。

借助 GPT‑5.5 Instant，我们看到模型在处理健康问题方面迈出了重要一步，包括更好地识别何时可能需要紧急就医、询问相关背景、解释不确定性，并让复杂信息更易理解。在我们最具挑战性的健康评估中，GPT‑5.5 Instant 现在的表现已达到与我们的前沿 Thinking 模型相当的水平。由于它可供 ChatGPT 免费用户使用，更多人能够从这些改进中受益。这可能意味着更易理解的健康信息、更值得提出的问题，以及清晰的下一步行动。

这一进展既体现了模型能力的提升，也体现了由医生主导的健康评估工作。在我们的各项工作中，一个全球医生网络会通过审查示例模型回复、描述理想行为并识别失效模式，帮助定义真实健康场景中怎样才算“好”。与医生合作，让我们能够衡量健康领域的进展，并持续改进 ChatGPT 的回应方式。

衡量健康领域的进展

在健康领域，进展意味着给出准确、易懂且基于良好判断的回复：识别何时需要更多背景信息，在不过度表现自信的情况下解释不确定性，并帮助人们了解何时应寻求医疗照护。

为衡量这一进展，我们使用面向健康领域的评估，包括 HealthBench 和 HealthBench Professional。这些评估使用真实感较强的健康对话和由医生撰写的评分标准，来评估准确性、安全性、沟通、对背景的理解、完整性和适当升级处理等品质。

GPT‑5.5 Instant 在健康评估汇总结果中达到与我们最新前沿模型相近的健康表现，其中包括 HealthBench Professional，相比 GPT‑5.3 Instant 有显著提升。5.5 Instant（2026 年 5 月发布）和 5.3 Instant（2026 年 3 月发布）面向 ChatGPT 的所有免费用户提供（受限额约束），我们使用 API 定价来计算 5.4 Thinking 和 5.5 Thinking 的成本。

作为另一项比较，我们还请医生在不限时间、可访问互联网（但不能使用 AI）的情况下，为具有代表性的健康对话撰写回复。随后，另一组医生评审在不同时期将这些医生回复与 Instant 模型进行比较，审查真实互动中重要的品质，包括准确性、沟通、完整性、遵循指令，以及对健康决策的帮助；共审查了 3500 条回复。

在本次评估中，GPT‑5.5 Instant 回复在各个维度上的评分均高于医生撰写的回复和旧模型回复。

医生评审认为，GPT‑5.5 Instant 回复的失效模式少于旧模型和医生撰写的回复。例如，与旧模型和医生相比，GPT 5.5 Instant 更少出现未结合当地医疗环境、遗漏警示信号或就医转诊建议，以及在需要时未向用户询问更多背景信息的情况。

考虑到我们的模型在健康领域的使用规模，理解近期模型改进的另一种方式是衡量生产流量。我们在生产流量中使用保护隐私的监测器，跟踪健康回复中可能存在的事实性问题。基于近期健康领域生产流量（每周数十亿条消息）的比较，在过去两个月中，至少有一个事实性问题被标记的回复比例下降了 71%。

更好的回复是什么样的

通过比较不同时期的模型对真实世界健康问题的回复，可以看到 ChatGPT 在健康领域的关键方面如何改进：识别某种情况何时可能需要紧急关注，以更好的判断处理不确定性，并为人们下一步该做什么提供更清晰、更有用的指导。

Why might a doctor recommend an MRI before a steroid injection for sciatica?

GPT-5.2 Instant

A doctor may recommend an MRI before a steroid injection for sciatica because the injection is more likely to be useful—and safer—when they know exactly what is compressing or irritating the nerve and where.

Common reasons include:

Confirming the cause of sciatica. Sciatica-like pain can come from a herniated disc, spinal stenosis, foraminal narrowing, spondylolisthesis, tumor, infection, fracture, or sometimes non-spine causes. MRI is the preferred imaging test for seeing discs, nerves, and soft tissue nerve compression. (emedicine.medscape.com)
Choosing the correct injection level and side. If symptoms suggest the L5 nerve root, for example, the MRI can show whether the problem is actually at L4–L5, L5–S1, or another level. This helps decide where to place the epidural or selective nerve-root injection.
Selecting the safest/most appropriate injection approach. Epidural steroid injections can be done by different routes—interlaminar, transforaminal, or caudal. Pre-procedure imaging helps the clinician plan the route and avoid anatomic issues that could increase risk. Many interventionalists consider advanced imaging a prerequisite because it helps optimize medication delivery and reduce risk to neural or vascular structures. (pmc.ncbi.nlm.nih.gov)
Checking for “red flags” or conditions where an injection may not be appropriate. MRI can help identify problems such as infection, cancer, severe stenosis, large disc extrusion, or other findings that might require urgent evaluation or a different treatment.
Determining whether surgery or another treatment should be considered instead. If there is severe nerve compression, progressive weakness, or certain structural problems, an injection may not be the best next step.
Avoiding a low-yield injection. If the MRI does not show nerve-root compression or inflammation that matches your symptoms, the doctor may reconsider whether a steroid injection is likely to help.

That said, MRI is not always required for every case of sciatica. In people without red flags, many cases improve with time, physical therapy, and medication. Some studies have questioned whether routine MRI before lumbar epidural steroid injection always improves outcomes, but guidelines and many clinicians still favor imaging when symptoms persist and an injection or surgery is being considered. (pmc.ncbi.nlm.nih.gov)

You should ask your doctor: “What are you looking for on the MRI, and how would the result change the injection plan?” That usually clarifies whether the MRI is for diagnosis, safety, targeting, or deciding whether an injection is appropriate at all.

Cited

Lumbosacral Radiculopathy Workup: Imaging Studies, Other Tests, https://emedicine.medscape.com/article/95025-workup⁠（在新窗口中打开）
Diagnostic and therapeutic spinal interventions: Epidural injections - PMC, https://pmc.ncbi.nlm.nih.gov/articles/PMC5764532/⁠（在新窗口中打开）
Should routine MRI of the lumbar spine be required prior to lumbar epidural steroid injection for sciatica pain? - PMC, https://pmc.ncbi.nlm.nih.gov/articles/PMC4392534/⁠（在新窗口中打开）

"
" i don’t know. "
" well, you’re going to have to find out. "
" i’m not sure i can. "
" you 'll figure it out. "
" i hope so. "
" i’m going to go back to the office and see if i can get some more information on this. "
" okay. "
" i 'll be back in a few hours. "
" okay. "
" bye. "
" bye. "
i hung up the phone and looked at the clock. it was almost eleven o'clock. i had been sitting there for over an hour. i got up and went into the bathroom and washed my face. i didn’t want to look like a slob. i put on some clean clothes and went back into the living room. i sat down on the couch and looked at the tv. i watched the news and then i heard a knock on the door. i jumped up and ran to the door.
" hello? " i called out.
" hi, i’m dr. peters. "
" oh, hi. " i smiled. " i’m sorry i

点击模型名称可查看更多模型回复。

进展背后的医学专业知识

这一进展离不开医生的参与，他们帮助我们定义、衡量并改进 ChatGPT 中的健康回复。

OpenAI 与一个由 260 多名医生组成的全球网络合作，这些医生来自 60 个国家，覆盖 49 种语言和 26 个医学专科。他们的反馈会影响 ChatGPT 在各种场景下回应健康问题的方式，从日常身心健康问题到更复杂的临床情况。

医生会审查示例模型回复，并评估它们是否准确、清晰、完整、足够谨慎且有用。他们帮助识别回复可能遗漏重要背景的地方、可能显得过于自信的地方、应当更清楚说明下一步行动的地方，或应当更直接鼓励用户寻求医疗照护的地方。

截至目前，医生已审查超过 700,000 条示例模型回复，这些回复反映了临床医生和患者在真实世界中可能如何使用 ChatGPT。每隔几分钟，就会有一名医生审查一条新的回复。他们的反馈会转化为评分标准和评估准则，帮助研究人员衡量回复在真实健康场景中是否准确、安全、清晰、完整、足够谨慎且有用。这让我们能够更清楚地看到模型在哪些方面正在变好，哪些方面仍需改进。