Quick Start¶
Installation¶
Requires Python 3.9+. All models (Silero VAD, WeSpeaker) are downloaded automatically on first use.
Basic Usage¶
from diarize import diarize
result = diarize("meeting.wav")
print(f"Found {result.num_speakers} speakers")
for seg in result.segments:
print(f" [{seg.start:.1f}s - {seg.end:.1f}s] {seg.speaker}")
Specifying Speaker Count¶
If you know the number of speakers, skip auto-detection:
Or constrain the range:
Exporting Results¶
RTTM format¶
# Write to file
result.to_rttm("output.rttm")
# Get as string
rttm_string = result.to_rttm()
print(rttm_string)
JSON-friendly dicts¶
Pydantic serialization¶
data = result.model_dump()
# Full result as a dict, including segments, speakers, estimation_details
Iteration¶
DiarizeResult is iterable:
for seg in result:
print(seg.speaker, seg.start, seg.end, seg.duration)
print(f"Total segments: {len(result)}")
Speaker Estimation Details¶
When auto-detection is used, diagnostic information is available:
result = diarize("meeting.wav")
if result.estimation_details:
details = result.estimation_details
print(f"Method: {details.method}")
print(f"Estimated speakers: {details.best_k}")
print(f"PCA dimensions: {details.pca_dim}")
print(f"BIC scores: {details.k_bics}")
Logging¶
diarize uses Python's standard logging module:
import logging
logging.basicConfig(level=logging.INFO)
# Now diarize() will print progress
result = diarize("meeting.wav")
Set level=logging.DEBUG for detailed diagnostics (BIC scores per k, etc.).