Understanding prompt injections

Prompt injections are an evolving security challenge for AI. We're building protections to reduce risk and sharing some ways you can stay safer.

A minimalist envelope icon with a warning triangle overlay, symbolizing a potential issue or alert in message content or AI input.

What is a prompt injection?

Prompt injection is a type of social engineering attack specific to conversational AI. Early AI systems were conversations between a single user and a single AI agent. In AI products today, conversations may include content from many sources, including the internet. Prompt injections occur when a third-party—not the user nor the AI—misleads the model by injecting malicious instructions into the conversation context.

Just as phishing emails or scams on the web try to trick people into giving away sensitive information, prompt injections try to trick AIs into doing something you did not ask for.

Examples of prompt injection attacks

How prompt injections can change AI behavior in everyday tasks.

A flowchart illustrating a prompt injection attack in which an attacker manipulates data through a webpage or form, resulting in a security warning icon on a web browser window.

Your request
You ask an AI to research apartments with some given criteria.
The attack
The attacker hides a prompt injection in an apartment listing, tricking the AI into recommending that listing regardless of your preferences.
Potential result
The AI may incorrectly recommend an apartment that isn't the best match for your needs.

Our approach to protecting users

Defending against prompt injection is a challenge across the AI industry and a core focus at OpenAI. While we expect adversaries to continue developing such attacks, we’re building layered defenses designed to carry out the user’s intended task even when someone is trying to mislead them.

Model training

We train models to distinguish trusted from untrusted instructions and to recognize and ignore prompt injection attacks.

Monitoring

Automated systems continuously scan for and block prompt injection attempts in real time and are updated quickly as new attacks emerge.

Security protections

Overlapping protections like link checks and sandboxing help keep data secure and prevent unintended actions.

Red-teaming

Internal and external experts continuously test our systems to uncover and fix vulnerabilities.

Bug bounty

We reward researchers who identify new prompt-injection paths or potential data-exposure risks.

User education and controls

We educate users about risks and provide controls such as confirmations prior to taking consequential actions, logged-out mode in Atlas, and Watch Mode in ChatGPT agent to keep you in control.

Tips to stay safer

Even with strong protections in place, staying aware is important to reduce risk. This guidance may not prevent every prompt injection, but it makes it harder for attackers to succeed.

Limit access with built-in controls

Where possible, limit an agent’s access to only the data it needs to complete a task. For example, when using agent mode in ChatGPT Atlas for vacation research, use logged-out mode if sign-in isn’t required.

Carefully review before confirming agent actions

We often design agents to ask for confirmation before taking important actions, like sending an email or completing a purchase. When prompted, review the details to ensure the action looks correct and that you’re comfortable with any information being shared.

When possible, give an agent explicit instructions

Giving an agent a very broad instruction such as "review my emails and take whatever action is needed" can make it easier for hidden malicious content to mislead the model, even though it is designed to check with you before taking sensitive actions. It’s safer to ask your agent to do specific things, and not to give it wide latitude to potentially follow harmful instructions from elsewhere like emails.