AI Voice

Voice Cloning

Last updated May 12, 2026

The process of training an AI model on samples of a target voice (30 seconds to several hours) to generate new speech in that voice from any text input.

Full definition

Voice cloning uses neural networks to learn the acoustic and prosodic characteristics of a target voice from audio samples, then generates new speech in that voice from arbitrary text input. In 2026 the state of the art (ElevenLabs Multilingual v2, Descript Overdub) produces clones that fool blind listeners over 80% of the time on conversational content. Two main tiers exist: Instant Voice Clone (30 seconds of source, ready in minutes) for surgical corrections and short content, and Professional Voice Clone (3+ hours of clean studio source) for audiobook-grade output. All major platforms require consent verification — cloning another person's voice without permission is prohibited and increasingly prosecutable under deepfake legislation.

Examples

·A podcaster using ElevenLabs to clone their voice once, then generating Spanish-language episodes from translated scripts.
·An audiobook narrator using Professional Voice Clone to produce 200 hours of content without re-recording.
·A creator using Descript Overdub to fix a misspoken guest name in an already-published episode.

Full definition

Examples

See also