Model Evaluation

Training Info

This page is the hub for model-vs-model comparison reports. Use it to evaluate whether a new training run actually improves scanner transcription quality before promoting it.

Validation Set Only

WER

Word Error Rate. Percentage of words that are wrong compared to edited ground truth. Lower is better.

CER

Character Error Rate. Useful when numbers, unit IDs, and short tokens matter. Lower is better.

Delta

Difference between model metrics. Negative delta for Model B means B improved over A.

Per-Call Wins

Shows where a model performs better call-by-call, not just in aggregate averages.

Training Output Reports

Select a report to view its summary and call grid below.

Report

Report: whisper_medium_v100_scan_vs_whisper_medium_v101_scan_20260419_111337_training_result_set.html

← Scroll horizontally inside the report to see all model columns →