Features 02.01.2024
Voice Cloning: The (£27m) Cost of a Stolen Voice
When an AI-enabled voice cloning cyber attack fools a branch manager into transferring £27m out of the business, it’s time to pay attention to the new cyber weapon
Features 02.01.2024
When an AI-enabled voice cloning cyber attack fools a branch manager into transferring £27m out of the business, it’s time to pay attention to the new cyber weapon
In March 2019, the CEO of a UK energy firm received a call from his German boss, asking him to wire $243,000 (£191,744) to a Hungarian supplier. Convinced by the voice of his superior, the CEO transferred the money straight into cyber criminals’ accounts.
The CEO became suspicious when the scammers called back a second time, requesting another transfer. But it was too late; he had already fallen victim to the first-known example of voice cloning used in a cyber attack.
A year later, in the United Arab Emirates, voice cloning appeared in another scam, with cyber criminals cloning the voice of a company director using artificial intelligence (AI) and deep voice technology. The attack was a success, convincing a Hong Kong branch manager to authorise transfers worth $35 million (£27.6 million).
Fueled by fast-developing AI technology – which, of course, is itself considered a major threat – voice cloning uses available samples of someone’s voice to create a convincing fake.
As the technology continues to improve, what exactly is voice cloning? How has it developed over the last few years? And why should you be worried about it?
Voice cloning is a scary technology. Using audio files containing a person’s real voice, fraudsters can create a clone of their “unique tonality” and “articulation style”, says Tony Fergusson, CISO EMEA at Zscaler. He describes how voice cloning tools “define the characteristics” of a voice first and then apply AI to train the system to imitate the person when reciting different texts.
“Anyone can purchase off-the-shelf software to clone almost anyone’s voice with ease and in no time at all” Jake Moore
You might have already seen it in the music industry, where technological advancements have fueled an increase in its use. It can “easily recreate” artists’ vocals for songs, says Mark James, data privacy consultant at DQM GRC. “The technology is clearly enticing for commercial purposes as it can significantly reduce the costs of booking in talent,” he says.
Jake Moore, global cybersecurity advisor at ESET, says that voice cloning software has improved in leaps and bounds over the past couple of years. “It’s got to the point where anyone can purchase off-the-shelf software to clone almost anyone’s voice with ease and in no time at all.”
This means it is “easier and faster” to replicate a voice, and the result is more convincing, says Dr Kiri Addison, senior manager of product management at Mimecast. “There are many apps and services available that do the hard work for you. Voice translation in different languages is also possible now.”
James warns there’s also the reality of voice cloning as a service on the darknet. “Cyber criminals use this technology in social engineering attacks such as voice phishing – also known as vishing – and spear-phishing.”
An attacker can gain a target’s trust by mimicking someone’s voice in a voice note or message. “And why wouldn’t they believe it when the voice sounds like the recipient and their words match the situation or timings in the given request?” Moore points out.
Therefore, it is no surprise that voice cloning is increasingly used in real-life attacks. Fergusson describes how, at the start of the year, one of his company’s sales directors received a phone call that they thought was from the firm’s CEO, Jay Chaudhry.
With his caller picture on the screen, the sales director heard Chaudhry say, “Hi, it’s Jay. I need you to do something for me” before the call cut off. The call was followed by a WhatsApp message: “I think I’m having poor network coverage as I am travelling at the moment. Is it ok to text here in the meantime?”
An unusual request followed, with Chaudhry apparently asking for help moving money to a bank in Singapore. “When the sales director approached their manager for guidance, the manager knew something was off and alerted the internal security team,“ Fergusson recalls.
The team quickly discovered that cyber criminals had reconstituted Chaudhry’s voice from clips of his public speeches to try to steal from the company.
After discovering how easily his voice could be stolen, Moore decided to use voice cloning in a simulated attack against a small business. It was easy to clone the CEO’s voice via videos from his YouTube channel, he says.
Moore added authenticity to the simulated attack by stealing the CEO’s WhatsApp account with the help of a SIM swap attack. “I then sent a voice message from his WhatsApp account to the financial director of his company – let’s call her Sally – requesting a £250 payment to a new contractor,” he wrote in a blog.
The voice message included where he was and said he needed the “floor plan guy” paid, adding that he would send the bank details separately straight afterwards. “Within 16 minutes of the initial message, £250 was sent to my personal account,” Moore says.
It’s safe to say that voice cloning is developing quickly. Unfortunately, it’s only going to get more sophisticated. But don’t get too worried just yet, as issues still prevent it from being 100% convincing.
For example, the technology isn’t quite good enough to hold a natural conversation due to the lacklustre speed of generating audio. “Given the victim’s response, the attacker has to be able to create an appropriate cloned voice to come back very quickly to keep the conversation flowing,” Dr Addison says.
“AI allows criminals to engage in increasingly convincing live conversations with the victim, making detection even more challenging” Mark James
However, he says, recent developments in large language models and tools such as ChatGPT mean that using AI to execute voice clone attacks fully is ”a realistic possibility” in the “very near future”.
James agrees that AI will become better at mimicking the unique characteristics of individual voices. “As it improves, it allows criminals to engage in increasingly convincing live conversations with the victim, making detection even more challenging.”
Financial directors are the obvious first target for voice cloning attacks. However, Moore says anyone with specific access rights is at risk and should be vigilant.
As the technology develops, Moore predicts that an increasing number of cyber attackers will naturally use voice cloning software as part of social engineering attempts to target companies.
With this in mind, preparing yourself and your employees is essential. Every firm should increase its awareness, train staff and implement the basic security controls to fight voice cloning attacks. “Tackling these risks needs a multidisciplinary approach, including education and awareness about the dangers of voice cloning,” James says.
1: Awareness training is key. “This includes fun, war gaming style penetration testing,” says Moore.
2: Have processes in place to ensure multiple layers of verification are required to process high-value requests, says Dr Addison.
3: Don’t just rely on email: Relying solely on email for communication leaves businesses exposed to voice cloning attacks, says James.
4: Exercise caution: Firms must be cautious when receiving unexpected phone calls, especially if the voice sounds familiar or authoritative and the caller has an unusual or unexpected request, says Fergusson.
5: Verify and authenticate: When in doubt, always verify the caller’s identity before sharing sensitive information, Fergusson says.
6: Always call back: If you receive a suspicious call, always call back using known contact information from internal directories, says Fergusson. “This ensures you are speaking to the intended person and not an imposter.”
7: Never reveal multi-factor authentication one-time password codes to anyone over the phone or email, Fergusson says.