diarize¶

Speaker diarization for Python — answers "who spoke when?" in any audio file.

Runs on CPU. No GPU, no API keys, no account signup. Apache 2.0 licensed.

Install¶

pip install diarize

Requires Python 3.9+. All models download automatically on first use. diarize installs a compatible torch/torchaudio range automatically.

from diarize import diarize

result = diarize("meeting.wav")
for seg in result.segments:
    print(f"  [{seg.start:.1f}s - {seg.end:.1f}s] {seg.speaker}")

How diarize compares¶

	diarize	pyannote (free)	pyannote (commercial)
License	Apache 2.0	CC-BY-4.0	Commercial
GPU required	No	No (7x slower on CPU)	No
HuggingFace account	No	Yes	Yes
Auto speaker count	Yes	Yes	Yes
DER (VoxConverse dev)	~4.8%	~11.2%	~8.5%
CPU speed (RTF)	0.12	0.86	---

DER and speed numbers for pyannote are from their benchmark page. The diarize number is from the VoxConverse dev evaluation described in Benchmarks.

Next Steps¶

How It Works --- pipeline architecture and algorithms
Benchmarks --- VoxConverse, AMI, speed comparison, limitations
API Reference --- full auto-generated API documentation

License¶

Apache 2.0 License. All dependencies are permissively licensed (MIT, Apache 2.0, BSD).