Phonon mega benchmark¶
The initial mega benchmark curated by Phonon (audiobench's company). It runs most laptop-friendly model × suite combinations, documents what was excluded, and flags controversial pairings.
Run locally¶
pip install -e .
# Optional extras for sound-id adapters:
# pip install -e ".[clap]"
chmod +x scripts/run_phonon_mega.sh
./scripts/run_phonon_mega.sh
This uses examples/matrices/phonon-mega.yaml and writes artifacts under results/phonon-mega/.
For heavier cells (Whisper medium/large), see phonon-mega-extended.yaml.
Qwen2-Audio (remote Modal)¶
Not in the main laptop matrix (no local GPU). Run separately against the Modal endpoint:
export AUDIOBENCH_QWEN_ENDPOINT=https://thenirock--audiobench-qwen2-qwenserver-web.modal.run
chmod +x scripts/run_phonon_mega_qwen.sh
./scripts/run_phonon_mega_qwen.sh
Pre-warm the endpoint once before a full run (cold start can exceed the 120s per-probe timeout). See qwen2-audio.
Publish to the leaderboard¶
Tag uploads explicitly as Phonon-authored:
hf auth login
export AUDIOBENCH_LEADERBOARD_DATASET=THENIROCK/audiobench-leaderboard-submissions
export AUDIOBENCH_AUTHOR=Phonon
for f in results/phonon-mega/*.json; do
audiobench push "$f" --author Phonon --tags phonon-mega-v1
done
--author sets authored_by on the submission (distinct from submitted_by, which remains your Hugging Face account). See Hugging Face leaderboard.
You can also use ./scripts/push_phonon_mega.sh to publish with the shared dataset defaults.
What is included¶
| Suite | Models |
|---|---|
ab/sound-id |
heuristic-v0, heuristic-weak, clap-base, qwen2-audio-7b (remote, separate script) |
ab/asr-robust |
whisper-tiny, whisper-base, whisper-small |
ab/asr-hallucination |
whisper-tiny, whisper-base, whisper-small |
ASR cells use limit: 20 clips to keep laptop runtimes reasonable. Sound-id cells use --profile demo-fast.
Inapplicable (protocol mismatch)¶
These combinations are not in the matrix because adapters use different protocols:
- Whisper (
transcribe) cannot runab/sound-id(yes/no probes). - Sound-id adapters (
answer) cannot runab/asr-robustorab/asr-hallucination.
Excluded for compute¶
| Item | Reason |
|---|---|
qwen2-audio-7b (local) |
~16 GB VRAM; use ./scripts/run_phonon_mega_qwen.sh with Modal instead |
whisper-medium, whisper-large |
Slow on laptop CPU; in phonon-mega-extended.yaml |
Controversial cells (included with disclosure)¶
First-party baselines (heuristic-v0, heuristic-weak). Phonon authored both the suite harness and these adapters. They are included as reproducible CPU baselines, not as independent competitors.
heuristic-weak. Deliberately weak (higher margin threshold + injected jitter). Included only as a diagnostic floor for audiobench compare.
Whisper on ab/asr-hallucination. Highlights a known class of non-speech hallucination behavior. Results are reported for transparency, not as a vendor attack.
Demo pack overlap. heuristic-v0 / heuristic-weak are tuned against the same demo pack the suite ships. Treat CLAP and Whisper scores as the more externally meaningful comparisons until external packs land.
Extend the matrix¶
Add a cell to examples/matrices/phonon-mega.yaml:
Then re-run ./scripts/run_phonon_mega.sh. For new adapters, see Bring your own model.
Public rankings page¶
The Leaderboard page shows the top five per suite (live from the HF submissions dataset, filtered to authored_by: Phonon). Link visitors there for a quick summary; link the HF Space for the full table.