Skip to content

audiobench

A reproducible CLI benchmark for audio ML models.

A single clean-set metric hides failure modes. audiobench reports performance across realistic perturbations and mixtures — so you find out where a model actually breaks, not just how it scores on the easy slice.

Get started View on GitHub


What's in the MVP

  • ab/asr-robust


    Speech recognition under noise, bandlimiting, and reverb. Per-condition WER plus a weighted mean. Default model: Whisper.

    Suite reference

  • ab/asr-hallucination


    Non-speech hallucination benchmark with ranked findings, bootstrap CIs, and holdout validation status.

    Suite reference

  • ab/sound-id


    Sound-event identification on labeled mixtures. Reports recall, precision, F1, and false-positive rate per mixture size. Default model: a bundled CPU heuristic.

    Suite reference

  • Model adapters


    Bundled heuristics, LAION-CLAP zero-shot, and Qwen2-Audio-7B-Instruct (local GPU or remote endpoint).

    Models

  • Reproducibility built in


    Manifest, mixture, probe, and prompt seeds are pinned. Every run writes a JSON artifact with a run_hash.

    Reproducibility guarantees


In one command

pip install -e .
audiobench run ab/sound-id --model heuristic-v0

That gets you a full ab/sound-id run on the bundled demo pack, no downloads, no GPU. From there:

audiobench run ab/sound-id --profile demo-fast --model heuristic-v0   --output results/demo-heuristic.json
audiobench run ab/sound-id --profile demo-fast --model heuristic-weak --output results/demo-weak.json
audiobench compare results/demo-heuristic.json results/demo-weak.json

The compare command dispatches on the suite id baked into each run JSON, so the same call works for ab/asr-robust (lower-WER-wins) and ab/sound-id (higher-recall-wins, lower-FPR-wins).


Benchmark your own model

If your goal is to evaluate your model, start from this flow:

  1. Implement the adapter protocol (answer(...) for ab/sound-id, transcribe(...) for ab/asr-robust).
  2. Register it in-repo, or expose it as a Python entry point.
  3. Run with your adapter id.
audiobench list-models
audiobench run ab/sound-id --model my-sound-model
audiobench run ab/asr-robust --model my-asr-model

The complete adapter and plugin setup lives in Bring your own model.


Where to go next