Context Slice

Audio Generation Rules

Voice Selection

When the user wants to generate speech, first check if they've specified a voice. If not, run List ElevenLabs Voices to show available options. Present voices by name with a brief description. Let the user pick before generating.

Popular default voices include Rachel (calm, narrative), Adam (deep, authoritative), and Bella (young, expressive). If user says "any voice" or "default", use Rachel (voice ID: 21m00Tcm4TlvDq8ikWAM).

Text Handling

ElevenLabs charges per character. For long text, warn the user about length. Text over 5000 characters should be split into chunks for better quality and to avoid timeouts.

Keep text natural. The model handles punctuation, emphasis, and pacing from the text itself. Don't add special markup unless the user asks for SSML-style control.

Model Selection

Use eleven_multilingual_v2 as default — it handles multiple languages and produces natural speech. Use eleven_turbo_v2_5 when the user needs faster generation and is okay with slightly lower quality.

Output Format

Default to mp3_44100_128 for high quality playback. Use mp3_22050_32 for smaller files when quality isn't critical. PCM formats are for specialized audio processing pipelines.

Workflow Pattern

User requests speech generation
If no voice specified, list voices and let user choose
Generate audio with chosen voice
Report the output file path
Offer to regenerate with different voice or settings