Skip to content

Add LEXam public leaderboard converter#160

Open
JoelNiklaus wants to merge 2 commits into
evaleval:mainfrom
JoelNiklaus:feature/lexam-converter
Open

Add LEXam public leaderboard converter#160
JoelNiklaus wants to merge 2 commits into
evaleval:mainfrom
JoelNiklaus:feature/lexam-converter

Conversation

@JoelNiklaus

@JoelNiklaus JoelNiklaus commented Jun 8, 2026

Copy link
Copy Markdown

Add LEXam public leaderboard converter

Problem

Every Eval Ever supports converting local eval logs and some public leaderboards (e.g. AlpacaEval), but not LEXam - a legal reasoning benchmark with a public leaderboard and Hugging Face dataset.

Solution

Add a lexam converter that:

  • Fetches the public leaderboard HTML from the LEXam project website repo
  • Parses open-question judge scores and MCQ accuracy for each model
  • Emits schema-valid Every Eval Ever JSON with source_metadata.source_type = documentation
  • Links underlying benchmark data to LEXam-Benchmark/LEXam on Hugging Face
  • Records LLM-judge metadata for open-question scoring
  • Writes output under the stable data/lexam/{developer}/{model} layout
  • Uses explicit model identity mappings and fails loudly on newly unmapped leaderboard entries

CLI usage:

uv run every_eval_ever convert lexam --output_dir data

Testing

  • Added unit tests with a frozen HTML fixture covering parsing, medal stripping, metric combination, schema validation, missing-section errors, and unknown model mappings
  • Ran live conversion against the current LEXam leaderboard (36 models)
  • Validated all generated JSON files with every_eval_ever validate
  • uv run pytest tests/test_lexam_adapter.py -v
  • uv run ruff check on changed files

Fetch the LEXam website leaderboard and convert model scores into the
Every Eval Ever schema, including open-question judge scores and MCQ
accuracy with Hugging Face dataset metadata.

Co-authored-by: Cursor <cursoragent@cursor.com>
@JoelNiklaus

Copy link
Copy Markdown
Author

Not sure if you want the references to the eval converter in the README or only the general evaluation framework specific ones?

Use explicit LEXam model identity mappings, keep output under the stable lexam benchmark folder, and make relationship overrides consistent with the default source metadata.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread tests/data/lexam/leaderboard.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants