OpenAI is providing limited access to a text-to-speech generation platform it developed called Speech Engine, which can create synthetic speech based on 15-second snippets of someone’s voice. The AI-generated voice can read command text prompts in the same language as the speaker or in multiple other languages. “These small-scale deployments help inform our approach, safeguards, and thinking about how speech engines can be applied across industries,” OpenAI said in its blog post.
Companies with access include edtech company Age of Learning, visual storytelling platform HeyGen, frontline health software maker Dimagi, AI communications app creator Livox and health system Lifespan.
In these examples released by OpenAI, you can hear that Age of Learning has been using the technology to generate pre-written voiceover content and read “live, personalized responses” written by GPT-4 to students.
First, the English reference audio:
Here are three AI-generated audio clips based on that sample,
OpenAI said it began developing the speech engine in late 2022, and the technology already provides preset voices for the text-to-speech API and ChatGPT’s reading function.during an interview TechCrunchJeff Harris, a member of the OpenAI Speech Engine product team, said the model was trained on a “combination of licensed and publicly available data.” OpenAI told the publication that the model will only be available to about 10 developers.
AI text-to-audio generation is a growing field of generative AI. While most people focus on musical instruments or natural sounds, few focus on speech generation, in part because of the issues cited by OpenAI.Some of the companies in this space include companies like Podcastle and ElevenLabs, which offer AI voice cloning technology and tools vejicastle Explored last year.
According to OpenAI, its partners agree to abide by its usage policy, which states that they will not use Voice Generation to impersonate a person or organization without consent. It also requires partners to obtain “explicit and informed consent” from the original speakers rather than create their own voices for individual users and disclose to listeners that the voices are generated by artificial intelligence. OpenAI also adds watermarks to audio clips to track their origins and proactively monitor how the audio is used.
OpenAI has proposed a number of measures it believes could limit the risks of such tools, including phasing out voice-based authentication to access bank accounts, policies to protect the use of people’s voices in AI, increasing education about AI deepfakes, and developing tracking System artificial intelligence content.