Features 07.05.2024

‘;– Have I been Clwned? An Assured AI Voice Cloning Experiment

We used AI to clone the voices of myself and our CEO to launch a targeted phishing attack on an Assured team member!

We used AI to clone our CEO’s voice and launch a targeted phishing attack on our unaware colleague. Eleanor Dallaway tells the story of how we orchestrated the attack and how it played out

“Would you like to meet your AI clone?” This PR email stood out from the rest. Would I? I suppose I would, and HackerOne offered to orchestrate that for me.

I booked some time with Dane Sherrets, solutions architect at HackerOne, and was given some homework. I was asked to send a five-minute recording of myself talking with as little background noise as possible. And Dane would use that to create my AI audio clone and demonstrate his methodology in a Zoom call the following week. Simple as that.

It occurred to me that whilst this would be fun, we could go one step further: Clone my co-founder and CEO’s voice and use it to attempt to phish one of our colleagues. HackerOne was game. Our CEO was game. And our colleague was blissfully unaware. It was game on…and I only felt slightly guilty.

Methodology

Jumping on Zoom with Dane, he was keen to lead with a disclaimer: “I know nothing about hacking audio. This is just something I’ve taught myself over the weekend.” Five minutes and five US dollars later, Dane had my AI clone in the palm of his hand. It’s frightening just how easy it is, he tells me. “And technology today is the worst it’s ever going to be. It’s going to get better, and it’s going to get even easier to use it.”

I’m impatient to meet my clone and very sceptical of how impressive it will be given that only five minutes and five dollars were lost in its making. But as I listen to ‘myself’ chat (You can listen too, by clicking on the hyperlinks) about how terrible British tea is and how American Football is far superior to its British counterpart (two things I can confidently confirm I would never say!), I’m floored. It actually sounds like me. For my five-minute audio clip, I’d read a chapter from one of my favourite non-fiction books, The Subtle Art of Not Giving a F*ck. “But I never even read out most of the words my clone says,” I told Dane. “You didn’t need to,” he said.

He explains that AI tools, which generate these voice clones, can train off as little as a 15-second clip. But the longer the audio and the clearer the recording, the better the clone. While Dane chose a $5 service to get an improved offering, “plenty of free, open-source ones are available too. Anyone curious can access them.”

“The AI models are trained with a lot of data about how people talk based on sex, nationality, age, and gender. It then provides a raw data set on top of those learnings.” Dane explained that the tools are primarily trained off British and American accents and that cloning beyond those is less advanced.

Given that I have my own podcast and hundreds of video clips of me speaking at events readily available online, I ask Dane whether I’m more vulnerable to voice cloning. “Well, yes,” he says without hesitation. Protecting against that is challenging, so you have to make it part of your threat model.

“That said, in targeted attacks, criminals have been known to access the schedule of the CEO or whoever they are targeting and find them in a local restaurant to record their voice ready for cloning.” It sounds far-fetched and like something straight out of Hollywood, but extreme actions will be taken when there are big bucks to be made.

C(lone)EO

He then played the voice clone of our CEO, Henry. It was good but notably less accurate than mine. But why? “The quality of Henry’s recording wasn’t as good as yours,” Dane explained. “You could hear him turning pages, and while yours was recorded with a podcast microphone and saved as an audio file, Henry’s was sent as a WhatsApp voice note. The better the recording, the easier it is to make it sound authentic.” Background noise can create artefacts in the audio, Dane explains.

The nuances of Henry’s clone that gave the game away, to me at least, was the accent, which sounded like a more Northern-sounding, jittery version of him. Live on the call, he showed me how to make little tweaks, and I advised on some more ‘Henry sounding’ words (and a more neutral accent) to try and make the phish more believable. Within a minute, we had an updated version. It wasn’t perfect, but it was enough to potentially outfox someone who wasn’t giving it full attention or perhaps listening through headphones on a busy train, for example.

We decided to use the urgency tactic, which Dane explains is a tried-and-tested method attackers use. In this case, we also went down the ‘act of service’ line to convince the recipient that to sort payroll for him, he needed to confirm his bank details due to a system error. “The premise that you’re trying to perform an act of service for someone, combined with urgency, are some of the biggest success factors.”

The Phish

So, how did it all go down?

We matched the timing with payroll and used a WhatsApp voice note (click to listen) to launch the targeted attack. Our CEO regularly sends voice notes
to team members, so this felt authentic, but we didn’t send it from his phone as we believed it much more likely that a true voice clone attack would be sent from a different, unknown mobile number.

We used Henry’s wife’s phone (removing her profile picture) to send the three voice notes one evening. The third explained he was on his wife’s phone as his mobile battery had died.

Our ‘victim’ was Caspar Rogers, one of our cyber brokers. Within two minutes of receiving Henry’s voice notes, he replied: “Not duping me, pal. Nice try though.” He then forwarded the voice notes to a few other team members, asking whether they’d received similar outreach. He forwarded it to Henry, too, with the message: “Mate, serious one, is this you? Just got a WhatsApp and it genuinely sounds like a deepfake.”

I asked Caspar what his initial reaction was when receiving the voice notes. “Sheer, naked panic,” he said. “I turned to my brother: mouth agape, knots in the stomach; you know the drill. I sat there confused (and excited) with so many questions running through my head; this surely isn’t Henry, but if not, then who on earth is it? Why is this potential threat actor targeting Assured and, more specifically, me? Has our Assured network also been compromised, given the CEO’s phone has been?”

He wasn’t fooled, but he was convinced it was a legitimate cyber attack. Part of me was a little bit disappointed that our attempt had failed. But a bigger part of me was relieved and proud that our team is vigilant.

How to spot a voice clone:

Dane offered his top tips on how to identify a voice clip as a deepfake. “A few years ago, you could look out for unnatural pauses or weird intonation, but today, there’s a lot less of that,” he admits. “Some audio services have an analyser that will inspect the voice and tell you if it’s fake. I’d always advise listening for background noise and any strange artefacts.

“Always be alarmed if someone is trying to create a sense of urgency,” he continues, adding that his top tip is having a known passcode with your friends and family so you can always check in with them to ask whether it’s legitimately them when in doubt.

I asked Caspar what red flags stopped him from being fooled.  “It had all the hallmarks of a targeted voice cloning attack,” he answered. He went on to list them:

  1. A new unrecognised number
  2. Purporting to be our CEO (Henry)
  3. A request for bank details
  4. A sense of urgency
  5. Follow-up chaser messages
  6. The voice sounded ever so slightly robotic

According to Dane, the bad news is that commercial tools are available to mask phone numbers and make the messages appear to come from a different number. “Criminals use social engineering to take it one step further,” he warns.

Henry was unsurprised that the phish failed. “Caspar knows me well; we speak often, so it was a hard mission. My voice wasn’t bang on, and if it were me, I’d have given additional context. If we were a company of 500 people and I sent that to a new employee, the result may well have been different, though,” he contemplated.

With artificial intelligence improving voice cloning tech exponentially, I wonder whether it can also be used to detect voice cloning attacks? “There are ways of training AI to detect AI-generated signals in audio files,” Dane explains, “just like how AI is used in phishing detection.” However, awareness and scepticism are the biggest tonics for resilience in identifying voice cloning. Caspar demonstrated both, and we’re optimistic that given word of this attempt got around the office, everyone will be extra vigilant and aware of targeted attacks – be it from legitimate attackers or just us.

Thank you to Dane Sherrets and HackerOne for their assistance with this experiment. 

Latest articles

Be an insider. Sign up now!