OpenAI revealed on Friday a voice-cloning tool that it intends to tightly control until safeguards are established to prevent audio fakes aimed at deceiving listeners.
According to an OpenAI blog post detailing the results of a small-scale test, a model named “Voice Engine” can replicate someone’s speech based on a 15-second audio sample.
The San Francisco-based company acknowledged the serious risks associated with generating speech resembling people’s voices, particularly in an election year. OpenAI stated, “We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year.”
To address these concerns, OpenAI is collaborating with partners from various sectors, including government, media, entertainment, education, and civil society, to gather feedback and ensure responsible development.
Given the potential for misuse of synthetic voice technology, OpenAI emphasized a cautious and informed approach to a broader release.
This cautious approach follows a recent incident where a political consultant admitted to orchestrating a robocall impersonating US President Joe Biden during the New Hampshire primary.
Experts worry about the proliferation of AI-powered deepfake disinformation in key elections worldwide, including the 2024 White House race.
OpenAI stated that partners testing Voice Engine must adhere to rules requiring explicit consent from individuals whose voices are replicated. Additionally, audiences must be informed when they are listening to AI-generated voices.
To mitigate misuse, OpenAI has implemented safety measures such as watermarking to trace the origin of generated audio and proactive monitoring of its usage.