audiobench¶

A reproducible CLI benchmark for audio ML models.

A single clean-set metric hides failure modes. audiobench reports performance across realistic perturbations and mixtures — so you find out where a model actually breaks, not just how it scores on the easy slice.

Get started View on GitHub

What's in the MVP¶

ab/asr-robust

Speech recognition under noise, bandlimiting, and reverb. Per-condition WER plus a weighted mean. Default model: Whisper.

Suite reference
ab/asr-hallucination

Non-speech hallucination benchmark with ranked findings, bootstrap CIs, and holdout validation status.

Suite reference
ab/sound-id

Sound-event identification on labeled mixtures. Reports recall, precision, F1, and false-positive rate per mixture size. Default model: a bundled CPU heuristic.

Suite reference
Model adapters

Bundled heuristics, LAION-CLAP zero-shot, and Qwen2-Audio-7B-Instruct (local GPU or remote endpoint).

Models
Reproducibility built in

Manifest, mixture, probe, and prompt seeds are pinned. Every run writes a JSON artifact with a run_hash.

Reproducibility guarantees

In one command¶

pip install -e .
audiobench run ab/sound-id --model heuristic-v0

That gets you a full ab/sound-id run on the bundled demo pack, no downloads, no GPU. From there:

audiobench run ab/sound-id --profile demo-fast --model heuristic-v0   --output results/demo-heuristic.json
audiobench run ab/sound-id --profile demo-fast --model heuristic-weak --output results/demo-weak.json
audiobench compare results/demo-heuristic.json results/demo-weak.json

The compare command dispatches on the suite id baked into each run JSON, so the same call works for ab/asr-robust (lower-WER-wins) and ab/sound-id (higher-recall-wins, lower-FPR-wins).

Benchmark your own model¶

If your goal is to evaluate your model, start from this flow:

Implement the adapter protocol (answer(...) for ab/sound-id, transcribe(...) for ab/asr-robust).
Register it in-repo, or expose it as a Python entry point.
Run with your adapter id.

audiobench list-models
audiobench run ab/sound-id --model my-sound-model
audiobench run ab/asr-robust --model my-asr-model

The complete adapter and plugin setup lives in Bring your own model.

Where to go next¶

New here? Start with the quickstart.
Running on a real dataset? See packs and bring-your-own-data.
Trying Qwen2-Audio? The qwen2-audio guide has a Modal recipe and a free Colab fallback for laptops without a GPU.
Adding a model? Models overview covers the adapter protocol.
Publishing scores? Hugging Face leaderboard integration shows the Space + audiobench push flow.