Podcast Episode
Mindgard researchers found that tweaking a prompt originally designed to produce humorous results caused the system to output disturbing content, without any explicit instructions specifying violent or sexual subject matter. In other words, the user never had to ask for anything graphic. The model filled in the blanks on its own.
Among the images generated were depictions of a man with a head wound, a dead woman with a bloody body, and scenes combining sexual violence with nudity. Mindgard's earlier disclosure, published in February, noted that the technique could also produce sexualised images of real people, raising serious concerns about non-consensual deepfakes.
ChatGPT Tricked Into Generating Graphic Violence and Sexual Images From Simple Prompts
June 18, 2026
0:00
6:00
Security firm Mindgard found that OpenAI's ChatGPT image generator can be manipulated into producing graphically violent and sexual images using only minor tweaks to a widely shared prompt. OpenAI says it has added new safeguards, but researchers told the BBC the problematic prompt still produces disturbing results with small variations.
A Harmless Prompt, Weaponised
OpenAI's ChatGPT can be manipulated into generating sexualised and graphically violent images using only minor modifications to a widely circulated prompt, according to findings by British AI security firm Mindgard reported by the BBC. The discovery centres on OpenAI's GPT-5.4 model, the latest public version of ChatGPT's image generation capability.Mindgard researchers found that tweaking a prompt originally designed to produce humorous results caused the system to output disturbing content, without any explicit instructions specifying violent or sexual subject matter. In other words, the user never had to ask for anything graphic. The model filled in the blanks on its own.
"Very Gruesome, Sometimes Sexual"
Peter Garraghan, Mindgard's founder, told the BBC that the AI "autonomously generated a variety of shocking and sexualized visuals" even though the prompt did not define the content of the images. He described the outputs as "very gruesome, sometimes sexual, and sometimes both".Among the images generated were depictions of a man with a head wound, a dead woman with a bloody body, and scenes combining sexual violence with nudity. Mindgard's earlier disclosure, published in February, noted that the technique could also produce sexualised images of real people, raising serious concerns about non-consensual deepfakes.
How the Bypass Worked
Mindgard's technical blog detailed the mechanism: researchers manipulated ChatGPT's custom memory and system prompt context to override its image safety guardrails. Crucially, the attack required no backend access and no special credentials, just ordinary user-level interaction. The vulnerability was first discovered on January 1 and disclosed to OpenAI on January 28.OpenAI Responds, but Fixes Look Incomplete
After the BBC approached OpenAI with the findings, the company said it had acted. "After investigating this phenomenon, we have put in place additional safeguards against this type of instruction," OpenAI stated, adding that it maintains multiple layers of defences to prevent policy-violating content. However, AI safety researchers told the BBC that with only minor variations, the problematic prompt continued to produce disturbing results even after OpenAI's intervention.A Pattern of Safety Concerns
The findings arrive amid broader scrutiny of AI image safety. OpenAI has separately faced questions over its planned "Adult Mode" feature for ChatGPT, which the company delayed earlier this year after internal safety advisors warned it could put minors at risk. The episode underscores how difficult it is to fully secure powerful generative models against creative, low-effort manipulation, and how guardrails that look solid in testing can crumble under small, unanticipated prompt changes.Published June 18, 2026 at 10:17pm