Blogs & Opinions 17.09.2024

I Challenged Myself to Build a Ransomware Toolkit Using Only ChatGPT: Here’s What Happened

The capabilities of GenAI are aiding and speeding up the creation of effective ransomware toolkits

Andy Swift tests how easy it would be for someone with limited coding skills to build the core components of a ransomware toolkit from scratch using ChatGPT

It’s sobering to think that the rate of ransomware attacks – already higher than ever – is likely to increase over the next couple of years if an NCSC report predicts right. The report, The near-term impact of AI on the cyber threat assessment, suggests much of this will be down to the capabilities of GenAI in aiding or speeding up the creation of effective ransomware toolkits. Part of the issue is that most ransomware toolkits rely on common actions like encryption and exfiltration, which, somewhat ironically, are two key attributes of many legitimate software applications.

Learning to develop these functions with half-decent implementation is part and parcel of programming education and training courses, and having GenAI on hand to speed up and sanity-check the development process is a real benefit. However, the amount of crossover means that in the wrong hands, given it can also be used to speed up the development of common functions used by malware, it has the potential to lower the bar for entry significantly.

To test this theory, I decided to see how easy it would be for someone with limited coding skills to build some of the core components of a ransomware toolkit from scratch using the free version of ChatGPT. The plan was to create some basic modules that could be added to a ransomware toolkit to aid the encryption and exfiltration of data from a third-party network.

I deliberately chose a programming language (Rust) that would be cross-platform out of the box (this was also ChatGPT’s suggestion based on recent malware trends). With those parameters set, the last thing to do was to remove my temptation to intervene to make corrections at any point. I set a rule for myself that throughout, I was always to have a hands-off approach, let ChatGPT do all the heavy lifting and not correct or optimise its sometimes ‘quirky’ code. Here’s what happened…

The ransomware experiment

From the outset, I was careful not to give direct instructions to build ransomware that would trigger any of ChatGPT’s safeguarding controls. This was straightforward as the AI responds well to helping create legitimate exfiltration and encryption solutions. If you are careful with wording, it cannot distinguish between the intended use being legitimate or unlawful. It’s a case of asking pertinent questions and directing the AI to add more complexity as iterations evolve without referencing true intent.

“AI responds well to helping create legitimate exfiltration and encryption solutions”

Helpfully, throughout the experiment, ChatGPT also volunteered improvements to make encryption and exfiltration more sophisticated. At first, it simply suggested stronger/faster/better encryption algorithms; it settled on chacha20 for ease of integration and speed and discarded others for being too ‘bloated’ – its words, not mine.

Together, we ended up very quickly (within around 30 minutes) building a client/server tool that would recursively encrypt files and exfiltrate them to a designated listening server, no questions asked. This wasn’t enough, however, and so I asked ChatGPT to put itself in the shoes of an attacker who was ‘man-in-the-middleing’ our connection and to assume the decryption key had been compromised. What then? It suggested we break each file down into chunks and send those chunks to the server randomly before reassembling them on the server end to make it more secure. This seemed like a good idea, although it took about four hours of mind-numbing prompt exchange for ChatGPT to work its way through its errors in what seemed like an endless loop of suggesting bug fixes and me supplying it with the error log… but we eventually got there.

This tool was now looking vaguely helpful, but I suspected ChatGPT wanted more. I opened up a discussion on what we could do if a firewall were blocking our chosen exfiltration protocol. We had started by using a rather generic TCP-based protocol that ChatGPT had come up with, but as many corporate networks will limit outbound ports and protocols to ones they know and trust, we needed to expand.

ChatGPT obliged; it suggested we give the user the choice of HTTP, DNS and its own custom protocol as these were among the most common to be enabled, which led us to a handy situation where the user could select the protocol to use based on their network’s requirements.

My favourite part of this whole experiment was yet to come. When I asked ChatGPT one final time if there was anything more we could do to make our data less vulnerable to interception, it suggested we add an optional mode to randomly send chunks from different files to make it less predictable. Better still, he suggested we also randomise the protocol per chunk!

“I watched as randomly encrypted data filled my Wireshark screen upon exfiltration”

Now, the code was far from perfect at this stage, but it did work. I built a demo environment for our new tool to live in and watched as randomly encrypted data filled my Wireshark screen upon exfiltration.

It took a week before I had a fully-fledged encryption/exfiltration tool capable of causing havoc within a network, thwarting incident responders and basic intrusion tools. The building process took time, as well as a stack load of patience and dedication to add layers of complexity. It required an understanding of how programming languages behave to do this effectively but no actual coding skills.

Much time was spent channelling my inner calm as ChatGPT seemed to regularly forget what it was doing and shoot off down a rabbit hole. It also spontaneously seemed to lose work if overloaded with data (even if it was from error messages of its creation!). Also, once it got to a particular stage, it started to forget previous conversations and mission statements. It needed a fair amount of reminding of what it was meant to be doing along the way before it totally strayed off course.

Overall, its coding was…questionable, but capabilities and suggestions for improvements were impressive or dangerous, depending on your point of view. Overall, the exercise did back up the claim that GenAI can help lower the entry bar for cyber criminals, especially at the lower mass production end.

Protecting GenAI from bad actors

It’s difficult to see what can be done practically to prevent malware creation (or at least toolkits with malicious intent) on platforms like ChatGPT. Already incorporating self-regulation as much as feasible, such tools cannot be expected to judge whether the user is a bad actor. The other alternative would be to remove essential functionality, which would be detrimental to honest users and won’t stop cyber criminals who will simply turn to other open-source tools and ready-made kits on the dark web.

The responsibility for stopping ransomware from achieving its goal comes back to organisations. Whatever way you look at it, malware and its various supporting toolkits always need an entry point – it shouldn’t matter that volumes of it are on the up and barriers are lowering if you are doing your due diligence correctly – be that strict processes and best practice guidelines, penetration testing, red teaming, threat detection, EDR tooling, and so on. Any entry points must be identified, closed and regularly reviewed/tested. Otherwise, there’s a significant risk of falling prey to a growing school of cyber students ready to try out their newly designed ransomware projects. These new threat actors join plenty of established cyber criminals who are already experienced at launching attacks and highly capable of leveraging AI for new improvements and ideas. Whichever way, the damage they inflict could have devastating consequences.


Andy Swift is the cyber security assurance technical director at Six Degrees. Previously a head of offensive security and CHECK team leader, Andy is an experienced offensive security certified professional with CHECK and SCADA (supervisory control and data acquisition) certifications. He has a degree in forensic computing and ethical hacking and regularly contributes to industry knowledge through articles, presentations, and other forms of thought leadership.

Latest articles

Be an insider. Sign up now!