Voice phishing is AI fraud in real time

00:00

{"text":[[{"start":7.34,"text":"The writer is an AI researcher at Bramble Intelligence and worked on the State of AI Report 2025"}],[{"start":16.33,"text":"Until recently, building an artificial intelligence system that could hold a convincing phone conversation was a laborious task. You had to combine separate tools for speech recognition, language processing and speech synthesis, all linked through fragile telephony software. "}],[{"start":36.68,"text":"This is no longer true. The arrival of real-time, speech-native AI models such as OpenAI’s RealTime API, launched last year, means a system that once required multiple components can now be created in minutes. "}],[{"start":54.76,"text":"Publicly available code can connect these models to a phone line. The AI model listens, “thinks” and responds in an instant. The result is a synthetic voice that can converse fluently, improvise naturally and sustain a dialogue in a way that feels human. "}],[{"start":77.41,"text":"In the past year we have moved from the theoretical possibility of widescale AI-enabled voice phishing — or vishing — scams to the reality. Last year, UK tech company Arup was defrauded of $25mn in a deepfake scam, while a vishing attack on Cisco succeeded in extracting information from a cloud-based customer relationship management system it used."}],[{"start":107.00999999999999,"text":"What once demanded expert knowledge is now available, pre-packaged, for anyone to exploit. Low-latency voice-native models have removed the final technical barriers to real-time AI voice fraud. "}],[{"start":123.21,"text":"In testing, it took me only a few lines of instruction to make such a system act like an HR manager calling about the payroll or a fraud officer warning of suspicious activity. Because AI can reason and change strategy in real time, its manipulation is adaptive."}],[{"start":146.54,"text":"The technology itself has legitimate uses, such as healthcare follow-ups, customer service or language tutoring. But the same accessibility that enables innovation also enables harm. A single operator could in theory launch hundreds of thousands of fraudulent calls a day, each one tailored to their target."}],[{"start":171.28,"text":"This threat is compounded by the increasing realism and low costs of platforms like ElevenLabs or Cartesia, which can facilitate voice cloning with very short audio samples."}],[{"start":184.81,"text":"In the case of public figures, it is possible — and relatively easy — to gather hours of audio and produce a compelling approximation of their voice without their knowledge. Public officials have already been impersonated in such attacks, according to the FBI. It has warned the public not to assume that messages claiming to be from a senior US official are authentic."}],[{"start":214.2,"text":"MIT’s Risk Repository, a database of over 1,600 AI risks, shows that in the past five years, the proportion of AI incidents associated with fraud has increased from around 9 per cent to around 48 per cent."}],[{"start":231.48999999999998,"text":"The scale of this cyber crime means voice-verification systems that identify customers by their speech patterns are now a liability. Sensitive requests and high-value transactions should require multi-factor verification that does not depend on how someone sounds."}],[{"start":252.46999999999997,"text":"For the rest of us, the lesson is simple: the voice on the other end of the line is no longer evidence of who is speaking. Just as we have learnt to treat emails with caution, we must today learn to doubt a human-sounding voice. In time, we may need to create vocal watermarks or digital signatures that verify speech as genuine."}],[{"start":279.34999999999997,"text":"Debates around AI are sometimes framed in existential terms. But it is the smaller risks that will reach us first."}],[{"start":290.29999999999995,"text":"Fraud and impersonation corrode trust in everyday communication. These supposedly mundane crimes are the front line of the AI transition. The same ingenuity that created the tools must be applied to securing them."}],[{"start":307.28,"text":"The real disruption of generative AI — the quiet, invisible kind — has already arrived. It will not announce itself with superhuman intelligence but with a phone call."}],[{"start":330.36999999999995,"text":""}]],"url":"https://audio.ftcn.net.cn/album/a_1763379847_6499.mp3"}

尊敬的用户您好，这是来自FT中文网的温馨提示：如您对更多FT中文网的内容感兴趣，请在苹果应用商店或谷歌应用市场搜索“FT中文网”，下载FT中文网的官方应用。

{"text":[[{"start":7.34,"text":"The writer is an AI researcher at Bramble Intelligence and worked on the State of AI Report 2025"}],[{"start":16.33,"text":"Until recently, building an artificial intelligence system that could hold a convincing phone conversation was a laborious task. You had to combine separate tools for speech recognition, language processing and speech synthesis, all linked through fragile telephony software. "}],[{"start":36.68,"text":"This is no longer true. The arrival of real-time, speech-native AI models such as OpenAI’s RealTime API, launched last year, means a system that once required multiple components can now be created in minutes. "}],[{"start":54.76,"text":"Publicly available code can connect these models to a phone line. The AI model listens, “thinks” and responds in an instant. The result is a synthetic voice that can converse fluently, improvise naturally and sustain a dialogue in a way that feels human. "}],[{"start":77.41,"text":"In the past year we have moved from the theoretical possibility of widescale AI-enabled voice phishing — or vishing — scams to the reality. Last year, UK tech company Arup was defrauded of $25mn in a deepfake scam, while a vishing attack on Cisco succeeded in extracting information from a cloud-based customer relationship management system it used."}],[{"start":107.00999999999999,"text":"What once demanded expert knowledge is now available, pre-packaged, for anyone to exploit. Low-latency voice-native models have removed the final technical barriers to real-time AI voice fraud. "}],[{"start":123.21,"text":"In testing, it took me only a few lines of instruction to make such a system act like an HR manager calling about the payroll or a fraud officer warning of suspicious activity. Because AI can reason and change strategy in real time, its manipulation is adaptive."}],[{"start":146.54,"text":"The technology itself has legitimate uses, such as healthcare follow-ups, customer service or language tutoring. But the same accessibility that enables innovation also enables harm. A single operator could in theory launch hundreds of thousands of fraudulent calls a day, each one tailored to their target."}],[{"start":171.28,"text":"This threat is compounded by the increasing realism and low costs of platforms like ElevenLabs or Cartesia, which can facilitate voice cloning with very short audio samples."}],[{"start":184.81,"text":"In the case of public figures, it is possible — and relatively easy — to gather hours of audio and produce a compelling approximation of their voice without their knowledge. Public officials have already been impersonated in such attacks, according to the FBI. It has warned the public not to assume that messages claiming to be from a senior US official are authentic."}],[{"start":214.2,"text":"MIT’s Risk Repository, a database of over 1,600 AI risks, shows that in the past five years, the proportion of AI incidents associated with fraud has increased from around 9 per cent to around 48 per cent."}],[{"start":231.48999999999998,"text":"The scale of this cyber crime means voice-verification systems that identify customers by their speech patterns are now a liability. Sensitive requests and high-value transactions should require multi-factor verification that does not depend on how someone sounds."}],[{"start":252.46999999999997,"text":"For the rest of us, the lesson is simple: the voice on the other end of the line is no longer evidence of who is speaking. Just as we have learnt to treat emails with caution, we must today learn to doubt a human-sounding voice. In time, we may need to create vocal watermarks or digital signatures that verify speech as genuine."}],[{"start":279.34999999999997,"text":"Debates around AI are sometimes framed in existential terms. But it is the smaller risks that will reach us first."}],[{"start":290.29999999999995,"text":"Fraud and impersonation corrode trust in everyday communication. These supposedly mundane crimes are the front line of the AI transition. The same ingenuity that created the tools must be applied to securing them."}],[{"start":307.28,"text":"The real disruption of generative AI — the quiet, invisible kind — has already arrived. It will not announce itself with superhuman intelligence but with a phone call."}],[{"start":330.36999999999995,"text":""}]],"url":"https://audio.ftcn.net.cn/album/a_1763379847_6499.mp3"}

Voice phishing is AI fraud in real time

热门文章

相关话题

面对AI创造的社会财富，人类需要重构税法

终场哨声吹响后：媒体集团争夺世界杯观众

霍尔木兹海峡“暗航”增多

美国的CEO们越来越富有，却也越来越不安

FT社评：特朗普的AI基金构想有利于政治，不利于经济

Lex专栏：锡——从罐头材料变身AI热潮关键金属