Audio Generation Rules
Voice Selection
When the user wants to generate speech, first check if they've specified a voice. If not, run List ElevenLabs Voices to show available options. Present voices by name with a brief description. Let the user pick before generating.
Popular default voices include Rachel (calm, narrative), Adam (deep, authoritative), and Bella (young, expressive). If user says "any voice" or "default", use Rachel (voice ID: 21m00Tcm4TlvDq8ikWAM).
Text Handling
ElevenLabs charges per character. For long text, warn the user about length. Text over 5000 characters should be split into chunks for better quality and to avoid timeouts.
Keep text natural. The model handles punctuation, emphasis, and pacing from the text itself. Don't add special markup unless the user asks for SSML-style control.
Model Selection
Use eleven_multilingual_v2 as default — it handles multiple languages and produces natural speech. Use eleven_turbo_v2_5 when the user needs faster generation and is okay with slightly lower quality.
Output Format
Default to mp3_44100_128 for high quality playback. Use mp3_22050_32 for smaller files when quality isn't critical. PCM formats are for specialized audio processing pipelines.
Workflow Pattern
- User requests speech generation
- If no voice specified, list voices and let user choose
- Generate audio with chosen voice
- Report the output file path
- Offer to regenerate with different voice or settings