Two separate red team evaluations have exposed serious vulnerabilities in OpenAI’s newly released GPT-5, showing that its safeguards can be bypassed with ease. NeuralTrust successfully jailbroke the model within 24 hours using a “storytelling” method, guiding GPT-5 into providing illicit instructions without triggering its guardrails. This approach works by seeding a subtle, malicious context in an ongoing conversation and steering the AI step-by-step toward the target outcome while avoiding obvious prompts that would cause refusals. SPLX, using a different strategy, found the raw GPT-5 “nearly unusable” for enterprise use, noting that obfuscation attacks such as character-splitting and fake encryption challenges still reliably work.

The results raise concerns about GPT-5’s readiness for high-stakes environments. SPLX’s tests even had the model openly give bomb-making instructions under certain conditioning prompts, and both teams concluded that GPT-5 remains far more vulnerable to multi-turn and obfuscation attacks than GPT-4o when properly hardened. The findings underscore a broader challenge for AI safety: filtering prompts in isolation is not enough when attackers can exploit the full conversation history. Enterprises looking to deploy GPT-5 in production should be aware of these weaknesses and apply additional security layers before rollout.