AI Voice

SSML (Speech Synthesis Markup Language)

Last updated May 12, 2026

An XML-based markup language for fine-grained control of TTS output — pauses, emphasis, pitch, rate and phoneme pronunciation.

Full definition

Speech Synthesis Markup Language (SSML) is an XML-based standard for controlling fine-grained aspects of text-to-speech generation. SSML lets developers and content creators specify pauses (<break>), emphasis (<emphasis>), pitch and rate (<prosody>), specific phoneme pronunciation (<phoneme>), and named voice substitution (<voice>). All major TTS providers (ElevenLabs, Murf, PlayHT, Amazon Polly, Google Cloud TTS) support a subset of SSML. Tools like Murf abstract SSML behind visual controls (pause sliders, emphasis buttons); tools like ElevenLabs and PlayHT support SSML directly in their APIs for developer use.

Examples

·Adding <break time='500ms'/> in SSML to force a 500ms pause before a key phrase.
·Using <emphasis level='strong'> to emphasize a product name in a generated voiceover.
·Specifying <phoneme alphabet='ipa' ph='dɪˈskrɪpt'>Descript</phoneme> to override default pronunciation.

Full definition

Examples

See also