Suites¶
A suite is a benchmark: a manifest of clips or stimuli, a set of conditions
or perturbations, and a per-suite scoring rule. Every suite is identified by a
stable suite_id (e.g. ab/sound-id) and ships with a fixed revision. Both
are recorded in every run JSON.
Suites are grouped into three families that test different things and ship their own adapter contracts.
Task suites¶
Behavioral evaluations on labeled clips. The model is a transcriber, identifier, or detector.
| Suite | Task | Default model | Headline |
|---|---|---|---|
ab/asr-robust |
Speech recognition under noise / bandlimit / reverb | whisper-tiny |
weighted_mean_wer |
ab/asr-hallucination |
Non-speech ASR hallucinations with validated findings | whisper-tiny |
weighted_hallucination_rate |
ab/sound-id |
Sound-event identification on labeled mixtures | heuristic-v0 |
components_understood / components_present |
Signal suites¶
Reference-aware fidelity / psychoacoustic / phase checks. The model is an
AudioProcessor
that takes (audio, sample_rate) and returns processed audio.
| Suite | What it stresses | Headline |
|---|---|---|
ab/fidelity-roundtrip |
SI-SDR, MR-STFT, true peak, LUFS drift across stimuli × conditions | weighted_si_sdr_db, max_true_peak_dbtp |
ab/psychoacoustic-masking |
Audibility — keep audible tones audible, leave masked tones masked | masking_respect_score |
ab/phase-coherence |
Polarity, inter-channel correlation, M/S round-trip, sub-sample delay | phase_coherence_score, mean_polarity_score |
Temporal task suites¶
Frame-level evaluations. The model emits time-stamped events or speaker turns.
| Suite | Task | Headline |
|---|---|---|
ab/sed-urban |
Sound event detection on labeled urban-noise soundscapes | event_f1_iou50, segment_f1_1s |
ab/diarization-cw |
Speaker diarization with DER + Hungarian alignment + 0.25 s collar | der, mean_speaker_count_error |
Discover what's available¶
audiobench list # every suite + status
audiobench info ab/fidelity-roundtrip # stimuli, conditions, adapters
audiobench list-models --suite ab/sed-urban # bundled SED adapters
Reproducibility model (shared across all suites)¶
- Every fixture is procedurally rendered or hashed from a known source. The
manifest digest is part of every
run_hash. - The CLI seed is folded into the run hash. Two runs with the same suite + revision + manifest + model + seed produce the same hash.
audiobench compareandaudiobench gatework uniformly across suites — same artifact schema, same exit-code semantics, same JUnit output.
See Reproducibility guarantees for the hash schema and replay procedure.
Adding a new suite¶
A suite is a Python module under src/audiobench/suites/ that exposes:
- a
SUITE_IDandSUITE_REVISIONstring, - a
load_manifest() -> dictforaudiobench info, - a
run_suite(*, model_name, seed, ...) -> dictrunner that returns the canonical artifact shape (suite,model,headline,per_*,run_hash,manifest_hash).
Register it in audiobench/suites/__init__.py, add adapter-contract / report
/ gate / matrix / compare wiring for the new headline, and write tests
following tests/test_signal_suites.py or tests/test_temporal_suites.py as
templates.