Step-by-step workflow
1
Install Kokoro TTS
Runs on CPU — no GPU needed. Works on Windows, Mac, Linux.
pip install kokoro soundfile
# On Linux: sudo apt-get install espeak-ng
# On Mac: brew install espeak
2
Generate your first audio
Run this Python script.
from kokoro import KPipeline
import soundfile as sf
pipeline = KPipeline(lang_code='a') # 'a' = American English
# Available voices: af_heart, af_bella, af_sarah, am_adam, am_michael
text = "Hello from AItheGuru. Today we are exploring open source AI tools."
for i, (gs, ps, audio) in enumerate(pipeline(text, voice='af_heart', speed=1.0)):
sf.write(f'output_{i}.wav', audio, 24000)
print("Audio saved!")
3
Hindi voice generation
Kokoro supports Hindi — great for Indian content creators.
pipeline_hindi = KPipeline(lang_code='h')
hindi_text = "नमस्ते, मैं आपका AI सहायक हूं। आज हम कुछ नया सीखेंगे।"
for i, (gs, ps, audio) in enumerate(pipeline_hindi(hindi_text, voice='hf_alpha', speed=0.95)):
sf.write(f'hindi_{i}.wav', audio, 24000)
4
Batch process long scripts
For full YouTube videos or podcast episodes.
with open('script.txt', 'r') as f:
script = f.read()
import numpy as np
paragraphs = [p.strip() for p in script.split('\n\n') if p.strip()]
all_audio = []
for para in paragraphs:
for _, _, audio in pipeline(para, voice='af_heart', speed=1.0):
all_audio.append(audio)
sf.write('episode.wav', np.concatenate(all_audio), 24000)
Pro tips
→
af_heart = most natural English voice
→
Runs at 90x realtime on CPU — 1 min audio takes ~1 sec
→
Convert: ffmpeg -i output.wav -codec:a libmp3lame -q:a 2 output.mp3
Why this matters for India
// india context
Hindi TTS (lang_code="h") makes this the best free option for Hindi YouTubers, podcasters, and reel makers