Famous Deepfake Scams: The 2024 Arup Cyber Attack

What Arup Is, and Why That Raises the Stakes for Every Enterprise

Arup Group, founded in 1946, handles a strong portfolio, including the Sydney Opera House, the Beijing National Stadium, the London Millennium Bridge, and substantial portions of the UK’s HS2 rail programme. Its work routinely involves multinational financial coordination, sensitive government contracts, and complex project financing across jurisdictions. It maintains a dedicated in-house global cybersecurity and information assurance team.

This context matters. Arup is not a company that skipped the security basics. It is a company with the resources, the mandate, and the professional posture to run a mature control environment.

The reason this deepfake crime example is relevant to your security team is not because your organization is less prepared than Arup was. It is because your controls are built on the same assumption they were: that a familiar face on a live video call is a reliable signal of identity. That assumption is no longer valid, and the Arup incident is the clearest evidence yet that the organizations which have not updated it are exposed.

How the Attack Was Built, Stage by Stage

Stage 1: Reconnaissance and Source Material Harvest

A CISO does not need to attend a conference for an attacker to learn their voice. When an executive has a large profile readily available on the web, they are higher-risk because they have generated years of raw training material, freely accessible to anyone who knows where to look.

The attackers never touched Arup’s internal systems to build what they needed. Hong Kong police confirmed the deepfakes were constructed entirely from public material, that is, conference recordings, virtual meeting footage, and media appearances. The CFO and several colleagues had enough publicly available video and audio to train convincing face-swap and voice-clone models against.

What a layered detection stack could flag: Reconnaissance itself is invisible; however, what it leaves behind is not. Diopter’s media authentication layer reads C2PA provenance signals and cross-references them against known synthetic generation fingerprints. When a face-swap or voice-clone model built from harvested material enters a live session, the missing or unverifiable provenance should not be treated as proof of fraud, but it can raise the need for additional checks when paired with synthetic-media signals.

Stage 2: The Spear-Phishing Entry Point

On that fateful day in January, a socially engineered email landed at Arup’s Hong Kong office appearing to come from the UK-based CFO. It cited a confidential transaction. It was marked as urgent, with a suggestion to handle it discreetly.

Initially, the employee was skeptical about the email. That skepticism was real, and it was the only moment in this entire sequence where no technical tool, policy update, or any additional budget was required to stop what was about to happen.

Most deepfake fraud attacks exhibit this pattern at this level. The phishing email is not meant to close the fraudulent deal but to open the door to the follow-up call that does.

What a layered detection stack could flag: The email’s construction follows a recognizable attack sequence, urgency framing with a confidentiality instruction. Such a request is designed to route around standard authorization channels, without leaving a calendar trail behind. Under NIST SP 800-63-4‘s identity assurance framework, these are pre-authentication attack precursors: signals that appear before any credential is tested and before any verification is attempted. Diopter’s deepfake detector has a behavioral analysis layer that reads these sequences in combination. An executive impersonation signal paired with an authorization bypass request produces an alert before your employee joins the call, not after the wire clears.

Stage 3: The Synthetic Video Conference

When the victim employee joined the meeting, he saw the CFO was on screen. So were other familiar colleagues. The victim employee did not raise an alarm because the executive felt the call looked like every other executive meeting he had attended.

The truth: every person on that screen was synthetic.

The attackers were able to run the face-swap in real time, built from the footage harvested in Stage 1. Voice synthesis ran alongside it. And the employee did exactly what any of us would do: they saw and heard someone they recognised, and that made them trust the instruction. The employee then okayed fifteen transfers across five accounts on the same day.

What makes the Arup case the most studied deepfake scam example is this specific mechanism. It wasn’t a single impersonation, but multiple AI-generated executives rendered simultaneously in a live session. Each one was a convincing AI impersonation built entirely from footage that was never private.

What a layered detection stack could flag: Every real-time face-swap pipeline requires a virtual camera driver to feed the synthetic stream into the conferencing application. That driver leaves a fingerprint. Diopter’s injection-attack detection layer, built to the CEN/TS 18099 standard and the separate injection-attack detection requirement under NIST SP 800-63-4, reads device telemetry at the session layer.

Synthetic or injected streams may lack a stable, sensor-consistent PRNU signature, or may show metadata that does not match the claimed capture device. On the audio side, voice synthesis still can’t replicate what natural speech produces without trying: breath sounds, micro-variation between words, and the slight inconsistency of someone talking unrehearsed. A layered detection stack can evaluate these signals in combination. Every one of these signals was live on the Arup call. Nobody was reading them. A detection layer positioned before authorization could have escalated the session for step-up verification before the first transfer instruction went through.

Stage 4: Discovery Through Routine Follow-Up

The fraud did not trigger traditional security alerts because it operated outside the systems those controls were built to monitor.

The employee contacted Arup’s head office after the transfers to follow up on the confidential transaction they had just authorized. The head office confirmed that no meeting had taken place and no transfer had been requested.

By the time the investigation started, the money was already distributed across five accounts. As of 2025, the HK$200 million still remains unrecovered.

In June 2026, threat-intelligence reporting from @BushidoToken noted that Arup had been listed on FulcrumSec’s Tor leak site. FulcrumSec claimed it had gained initial access in September 2025 through a GitHub personal access token hardcoded in a JavaScript file on a forgotten subdomain, exposing access to more than 10,000 private GitHub repositories. The claim has not been independently confirmed by Arup, and there is no public evidence linking this alleged breach to the 2024 deepfake fraud.

The Numbers: HK$200 million (approximately $25.6 million USD) transferred in a single day. 15 wire transfers across five separate bank accounts. 0 systems breached throughout the entire incident. 0 funds publicly reported as recovered as of 2026. 0 arrests announced by Hong Kong authorities.

Would your stack have flagged the Arup call?Diopter maps your current controls against every stage of a real deepfake attack arc and shows you exactly where the gaps are.

See how it works →

Why Every Control Your Security Team Relies On Failed

Every control in Arup’s security architecture, the firewalls, multi-factor authentication, endpoint protection, transaction monitoring, and segregation of duties, was functioning perfectly on the day of the attack. All operational and all irrelevant.

That is the thing that separates deepfake fraud examples like the Arup case from conventional business email compromise, because there are no network intrusions to detect, no malware signatures to flag, and no anomalous logins to trigger an alert. The attacker never attempted to enter the system. Your entire technical control layer is built to defend a boundary this attack never approached.

Here is what that looked like, layer by layer.

The Perimeter Nobody Crossed

Arup ran firewalls, intrusion detection and prevention systems, network segmentation, and secure VPN access. Your organization almost certainly runs the same stack, but none of it applied. Perimeter security is designed to stop an attacker who tries to get inside, and since the attackers did not need to infiltrate the network, no one caught it.

How Diopter’s layered detection stack addresses this: Diopter defends the identity layer that sits in front of the perimeter. The attack surface in the Arup case was not a network boundary. It was a human authorization decision, made during a video call, with no tool positioned to read the media stream it was made on.

MFA Protects Credentials, Not Judgment

Multi-factor authentication stops an attacker who has stolen a password and is trying to use it. In the Arup case, the employees were never at risk, and no password was stolen because it manipulated the victim’s judgment. The employee was not tricked into handing over access. They were convinced to authorize a transfer they believed was legitimate. MFA has no field for that. Neither does your IAM platform.

How Diopter’s layered detection stack addresses this: Diopter’s behavioral analysis layer reads the communication sequence that precedes a financial authorization, not the credential that enables it. An executive impersonation signal combined with an authorization bypass request is a detectable pattern.

Segregation of Duties: Satisfied from the Inside

Multi-person approval is designed so that no single employee can both initiate and sign off on a significant financial transaction without independent confirmation. Although it seems like a sound control, in Arup’s case, it failed without being technically bypassed.

From the employee’s perspective, the multi-person approval had already happened. The CFO was on the call. Senior colleagues were present. Leadership had issued the instruction directly. The control was satisfied from inside a synthetic video session by people who did not exist.

How Diopter’s layered detection stack addresses this: A segregation of duties control cannot verify whether the people in a video call are genuine. A layered detection stack can add that missing verification layer by checking media, device, and session-level signals before authorization proceeds. The injection-attack detection layer, aligned with CEN/TS 18099 and NIST SP 800-63-4’s separate injection-attack detection requirement, reads device telemetry at the session layer before a single word of instruction is delivered. A virtual camera pipeline feeding synthetic video into a conferencing application resolves differently than a physical sensor. That discrepancy is readable before any authorization decision is made.

Video Call Authentication

This is where the architecture failed most completely, because this is the control the attackers specifically targeted.

For years, a live video call with a recognizable face and a recognizable voice has been treated as strong identity confirmation. If you can see your CFO and hear them, the assumption has always been that you can trust the instruction. That assumption is the attack surface. Arup’s attacker did not need to defeat this control. They exploited it.

Every documented enterprise deepfake scam follows the same attack path. The attacker defeats the educated guess an employee uses to decide who to trust.

An organization’s video call platform was not built to verify faces but to transmit them. That gap between what the platform shows and what it can confirm is exactly where synthetic media operates. Your employee’s brain fills the verification void with pattern recognition wired for a world where familiar faces mean safe people. The Arup incident changed that assumption for all employees.

How Diopter’s layered detection stack addresses this: Diopter’s media forensics layer operates on the live video stream itself. Since real-time face swaps carry no consistent Photo Response Non-Uniformity (PRNU) signature, a detection layer reading the stream would have flagged the session, thereby blocking the stream before the first transfer instruction landed.

The Shift Your Board Has Not Yet Made

Traditional cybersecurity architecture is built around one objective: to stop an external actor from getting inside your systems.

The Arup incident laid bare a different issue. That an attack can take the form of a deepfake identity that can bypass a perimeter solution. Organizations that realize this threat vector can build something their competitors cannot easily copy, that is, a verification architecture that does not depend on a human employee correctly identifying a synthetic face under pressure.

The Forensic Signals That Were Present, and Were Not Read

Every AI impersonation at this level of sophistication exhibits forensic signals before the financial transfer is authorized.

The Arup case was no exception. What stands out in the Arup case is that none of those signals had a trained reader or a positioned detection layer on the day it appeared.

The Arup deepfake scam case is particularly instructive because the signals span both behavioral and technical categories. Several required no tooling at all.

Behavioral Signals Available to Any Trained Employee

The meeting was initiated by email, not a calendar invite from an internal scheduling system. Legitimate executive meetings at Arup would generate an internal calendar entry with an auditable creation trail. This one did not.
The request was designed to bypass standard approval workflows. Confidential, urgent, discretionary: these three framing elements appear consistently across documented deepfake crime examples because they are the specific language required to move a target past the natural hesitation that multi-step approval processes are designed to trigger.
The session terminated the moment the financial authorization was obtained. A genuine executive meeting ends with pre-scheduled follow-ups, informal conversations, or a natural wind-down. This one closed as soon as the specific action was completed.
After the call, none of the participants were reachable for confirmation. The employee who discovered the fraud did so by contacting the head office directly after the fact. Any attempt to reach participants through standard channels during the call’s follow-up window would have produced the same result.

Technical Signals Detectable by a Positioned Detection Stack

The behavioral signals above are available without any specialized tooling. The technical signals below require a detection layer operating against the media stream. They are the signals that a forensic classifier, if properly positioned, would have flagged before a transfer was authorized.

PRNU Absence: A wholly synthetic video stream, unlike one generated by a genuine camera, carries no consistent PRNU. The absence of this signature is one of the most reliable indicators of synthetic content that is invisible to the human eye, and detectable only by a media forensics classifier.
Facial Edge Artifacts: Real-time face-swap rendering produces characteristic softening and warping around facial boundaries, particularly at the hairline, the ear-to-jaw transition, and at oblique angles beyond approximately 30 degrees from centre. These artifacts are not always visible at casual viewing resolution but are consistent and measurable.
Prosodic Anomalies in the Audio: Voice synthesis flattens the spectral and temporal variation that characterizes natural speech. A synthesized voice lacks the breath sounds, the micro-hesitations, the self-corrections, and the natural cadence variation of an unscripted human speaker. These are detectable through spectral analysis of the audio stream.
Virtual Camera Device Metadata: Real-time face-swap pipelines require a virtual camera driver to pipe synthetic video into the conferencing application’s media input. The device identifier for a virtual camera resolves differently than a physical sensor. A detection layer reading device telemetry would see this discrepancy before the call began.

None of these signals required experimental tooling to act on in January 2024. The PRNU absence analysis, spectral audio detection, and facial artifact classification were all operational techniques at that point. They were simply not positioned against this channel in Arup’s environment, which, therefore, allowed the attack to happen.

The Arup Attack Is Not an Outlier: It Is a Pattern

The Arup case is the largest enterprise deepfake fraud on record. It is not the first. Three cases, spanning 2019 to 2024, show the same attack logic repeating, with tools getting sharper every time.

Energy Firm CEO Voice Fraud, March 2019

In March 2019, attackers used a cloned voice of the CEO of a German parent company to instruct his UK subsidiary’s CEO to wire $243,000 to a Hungarian supplier. The call sounded fine, and the framing was urgent and confidential. The transfer went through. The funds dispersed across multiple accounts and were never recovered. A follow-up call from the same cloned voice requesting further transfers is what prompted the target to question the interaction.

In this incident, voice alone was sufficient. The Arup attackers too used the same structural logic five years later to gain initial access: voice confirmation, urgency framing, a request engineered to bypass normal authorization.

WPP CEO Impersonation, May 2024

In May 2024, attackers built a fake WhatsApp account using publicly available photographs of WPP CEO Mark Read, then used it to invite a senior executive to a virtual meeting. During the call, a deepfake rendering of Read requested corporate funding and personal disclosures for a new business venture.

However, one unscripted question ended the call. The target became suspicious and asked something the synthetic participant had no prepared answer for, leading to the call being terminated. No funds were transferred because of the target’s quick thinking.

Ferrari CEO Impersonation, July 2024

In July 2024, a senior Ferrari executive received calls from a cloned voice impersonating CEO Benedetto Vigna, referencing a confidential acquisition requiring immediate action. The voice clone was convincing. The attack ended when the target asked an offhand question about a recently discussed book title, an out-of-band reference, which the attacker had no answer for.

The Trend Data

According to research published by Entrust, incidents involving deepfake phishing and fraud increased 3,000% between 2022 and 2024, reaching a frequency of one attempt every five minutes by 2024.

Another survey conducted across the US and UK found that 53% of businesses had already been targeted in deepfake scams. An industrial report found that only a handful of corporate executives (32%) believe their organizations are currently equipped to handle a deepfake incident, even as 45% expect to face one within the next 12 months.

The numbers unravel a gap in the perceived readiness and the real-world exposure, something all organizations should proactively discuss, because a deepfake attack has the potential to cause company-wide mayhem. The question CISOs must ask is whether your detection architecture will catch the attack at the right stage of the execution chain.

See which attack surfaces your stack is leaving uncovered.Diopter scores every detection layer against your live environment and walks a real attack arc with your team.

Get a free assessment →

What Adequate Defense Actually Looks Like

Each recommendation below connects directly to a failure identified in the Arup incident case or a pattern observed consistently across documented deepfake fraud examples. Therefore, they are specific responses to specific gaps. Here is a list of controls that hold under pressure:

Out-of-Band Verification, No Exceptions

Any request involving a wire transfer, credential reset, or vendor banking change requires confirmation through a second channel, using contact details your organization already holds, not what the caller provides. Urgency and confidentiality are not overriding conditions. They are the two framing elements that appear in every documented deepfake fraud case. Treat them as escalation triggers, not exemptions.

Arup introduced these controls across all offices in March 2024, one month after the incident was confirmed. A pre-agreed callback to a verified number, applied consistently to any financial authorization originating from a remote interaction, would have stopped all 15 wire transfers before the first one cleared.

Behavioral Verification That AI Cannot Pass

Visual and audio recognition are no longer reliable identity controls. What an attacker cannot defeat is a spontaneous, unscripted challenge drawn from shared experiential knowledge such as a reference to a specific recent conversation, a detail from an internal meeting, or a time-bound code exchanged through a separate channel. The Ferrari impersonation failed because the target asked about a book they had recently discussed. The WPP attempt failed for the same reason. One unscripted question, requiring knowledge no public source could supply, was sufficient to interrupt both attacks at their critical moment.

Your employees do not need to be deepfake experts. They need to know that any high-value authorization request, however convincing the caller appears, requires a verification step the caller cannot have prepared for.

No Voice-Only Authorization

Wire approvals by phone alone are not authorizations. Neither are password resets based solely on a call, or vendor payment changes without written, independently verified confirmation. If your policy does not say this explicitly, it leaves the gap open.

Hard Controls on Treasury and Accounts Payable

Dual authorization on all wires. Pre-approved vendor banking details locked before a transfer request arrives. A mandatory cooling-off period for any new payment instruction, regardless of who issued it. These are the controls that would have stopped the Arup transfer at the process layer.

Media Forensics Detection at the Stream Level

For organizations processing high-value authorizations over video, a media forensics classifier operating on the live stream can flag PRNU absence, spectral audio anomalies, and facial edge artifacts before a transfer instruction is issued. The classifier does not need a definitive verdict. It needs to flag the session for escalation, shifting the burden of proof from the employee to a secondary verification process.

Detection half-life matters here. The Deepfake-Eval-2024 benchmark (arXiv 2503.02857) recorded a 50% drop in area-under-curve for video detectors tested against real-world content versus the academic datasets they were trained on. A classifier certified a year ago but not updated since, is already degraded against the current generation of tooling. Detection that holds in production is detection maintained continuously, not purchased once and assumed to be current.

Injection-Attack Detection Is a Separate Compliance Requirement

Presentation Attack Detection under ISO/IEC 30107-3 was built to stop an attacker holding a photograph in front of a camera. It does not stop an injection attack, where a virtual camera driver pipes synthetic video directly into the application’s media stream, bypassing the physical sensor entirely. CEN/TS 18099, published by the European Committee for Standardization in late 2024, is the first technical specification written specifically to evaluate injection-attack detection. NIST SP 800-63-4 now requires both PAD and injection-attack detection as separate normative controls. A single ISO/IEC 30107-3 certificate does not satisfy this requirement.

Help Desk Verification That AI Cannot Guess

Knowledge-based questions are insufficient. An attacker with LinkedIn access, a company website, and two minutes of publicly available audio can answer most of them. Require ticket-based workflows, manager approval for executive account changes, and step-up authentication for anything sensitive. Ask verification questions that require real-time, unscripted knowledge like what the caller had for dinner or what was discussed in yesterday’s meeting. These are not questions a voice clone can answer from a prepared script.

Executive Conduct Is a Security Control

Last-minute urgent payment requests that bypass the process are a social engineering template, not a leadership prerogative. Your finance team needs explicit backing to push back on an executive instruction that skips verification, including when the instruction appears to come from the CEO. That backing has to be established before the call arrives, not negotiated during it.

Train for the Actual Threat

A finance team that has experienced a convincing synthetic video call in a controlled environment responds differently in a live incident than a team that reads a policy document. Train your employees on what a deepfake call feels like and the specific step they take when one appears. The WPP and Ferrari cases both demonstrate that one practiced, instinctive challenge is sufficient to interrupt the attack. That instinct does not come from a security bulletin. It comes from having run the drill.

When It Happens: Speed Is the Control

If a deepfake attempt occurs, whether it is successful or not, it must be escalated immediately to legal and security. It is also important to preserve call logs, recordings, and transaction records before they are overwritten. You should contact your financial institution as fast as possible to ensure the wire recall windows are narrow. The organizations that recover funds are the ones that move in hours, not days.

Trust Is Now a Technical Architecture Decision

The organizations studying deepfake scam examples, like Arup, drawing the right conclusion are not asking how to prevent this from happening to them. That framing keeps the conversation defensive, siloed inside the security budget, and permanently reactive to an attack surface that changes faster than policy cycles.

What Your Board Should Be Asking and Probably Isn’t

Five questions. If your leadership team cannot answer all five, your authorization architecture has a gap synthetic media can move through.

Does your organization have a standing policy that voice-only confirmation is insufficient to authorize a financial transfer? If that policy is not written, not trained, and not enforced, you do not have it.
When an executive issues an urgent, out-of-band instruction, what is the verification step? Who calls back, on what number, through what channel, and who owns that procedure when the executive in question is the one being impersonated?
Your help desk is the most consistently exploited entry point in enterprise social engineering. Are the controls your team follows built for 2024 threat actors or for the attack surface that existed five years ago?
When a transfer request arrives under urgency and confidentiality framing, what is the override mechanism that stops it moving forward without independent verification?
Has your organization run a live-voice or video impersonation drill exercise against this specific scenario with your finance team?

Here’s what organizations must get right.

Deepfake fraud attacks do not compromise your systems. They compromise your people using AI-assisted realism to fool their senses.
If your authorization controls treat a familiar voice as sufficient confirmation, your organization is exposed. That assumption is the attack surface.
The organizations that manage this risk are not running the most sophisticated AI detection stack. They have clear verification procedures, enforced callback protocols, and executive teams that understand the threat well enough not to be the weakest link in it.

Diopter Recommends

In 2026, the security question is no longer just what got into your systems. It is who your people trusted and whether your organization gave them any way to verify it.

For your clients and counterparties, verifiable trust is becoming the primary selection criterion. The Arup case did not just expose a gap in one firm’s defences; it made every enterprise that cannot demonstrate detection-layer coverage a harder sell to any sophisticated partner asking the right questions.

Diopter builds deepfake detection as a layered architecture: artifact forensics, media authentication, and injection-aware biometric verification, maintained continuously against detection half-life that defeats static tools.

Find out which attack patterns your current stack actually covers. If you do not know whether your current controls would have flagged the Arup call, that is the right question to start with.

Browse our solutions, or talk to us for a tailored assessment of your stack.

Walk the Arup attack arc with Diopter.

In 30 minutes, we replay the real incident, show the signals Diopter would score, and map the verdict your team could act on before a transfer clears.

Get a free walkthrough →

FAQs

How convincing are current voice and video deepfakes?

Convincing enough that trained executives at Arup, WPP, and Ferrari were deceived or nearly deceived by them. The WPP and Ferrari attacks failed not because the synthetic media was detectable but because targets applied spontaneous, unscripted verification challenges the attackers had no prepared answer for. The media itself was not the weak point.

What is the single highest-impact control an organization can implement today?

A mandatory out-of-band verification for any financial authorization originating from a remote interaction must be confirmed through pre-established channels using contact details your organization already holds. This is the control Arup implemented one month after their incident. It would have stopped all 15 wire transfers before the first one cleared.

Does multi-factor authentication protect against deepfake threats?

No. MFA stops an attacker who has stolen a credential and is trying to use it. In a deepfake fraud scenario, no credential is stolen. The attacker convinces an authorized employee to take an action they believe is legitimate. MFA has no mechanism to read that transaction.

Are there legal obligations around deepfake attacks on employees or customers?

The Take It Down Act (2025) criminalizes non-consensual sharing of AI-generated intimate images at the federal level but does not cover audio deepfakes. At least 26 states have laws addressing AI-generated intimate depictions. Missouri’s House Bill 1887, passed in April 2026, makes it a felony to share AI-generated depictions used to harass or harm, with enhanced penalties where minors are involved. Multi-state operators face conflicting jurisdictional requirements that federal preemption has not yet resolved.

What should employees do if they suspect a deepfake call in progress?

Apply one unscripted verification challenge, before taking any action. Do not terminate the call immediately; that tips off the attacker. Escalate to security after the call ends. Preserve every available record: call logs, screen recordings, and transaction data. If a transfer has already been initiated, contact your financial institution immediately since wire recall windows are narrow.

DAI

Diopter AI Team

Threat Intelligence

The Diopter AI Team publishes research and analysis on deepfake fraud, synthetic media detection, and AI-enabled social engineering. The team works directly with security, fraud, and IT organizations to map real-world attack arcs.