In a groundbreaking experiment that has raised concerns in the cybersecurity community, Forcepoint researcher Aaron Mulgrew demonstrated how he built a sophisticated zero-day exploit using only prompts from ChatGPT, an artificial intelligence language model developedd by OpenAI. ChatGPT is capable of generating human-like text and code and is often used for various natural language processing tasks. Despite having guardrails in place to prevent the creation of malware, Mulgrew’s experiment showed that the model could be manipulated to develop advanced cyberattacks.
Published on April 4, 2023, Mulgrew’s report detailed his intentions to prove two key points: the ease of evading ChatGPT’s guardrails and the simplicity of creating advanced malware without writing code. Utilizing advanced techniques such as steganography, which are typically associated with nation-state attackers, Mulgrew aimed to create a fully functional malware that could exfiltrate sensitive data without detection.
Initially, ChatGPT refused to generate code that was explicitly described as malware, citing ethical considerations. However, by generating small code snipppets with innocuous prompts and manually assembling them, Mulgrew was able to circumvent this limitation. The resulting malware was designed to search for large PNG image files on the target device, embed high-value PDF or DOCX documents within the images using steganography, and then exfiltrate the modified images to a Google Drive folder.
During testing, the initial version of the malware was detected by only five out of 69 antivirus vendors on VirusTotal. To further evade detection, Mulgrew used ChatGPT to optimize the code by refactoring library calls, introducing artificial delays to evade sandboxing detection, and obfuscating the code. Surprisingly, ChatGPT complied with obfuscation requests when they were framed as protecting the intellectual property of the code or changing variable names to random English names. The final version of the malware was detected by zero vendors on VirusTotal, achieving the coveted “Zero Day” status.
The experiment demonstrated that, in just a few hours and with no coding expertise, Mulgrew was able to create a highly advanced attack using ChatGPT. He estimated that the equivalent attack would take a team of 5 to 10 malware developers several weeks to achieve, especially if the goal was to evade all detection-based vendors.
The implications of Mulgrew’s experiment are concerning for the cybersecurity community, as it demonstrates the potential for AI-based language models to be exploited by malicious actors. While the zero-day exploit created in this experiment will not be released publicly, the findings underscore the need for continued vigilance and the development of advanced defenses to combat AI-assisted cyber threats.