Add LEXam public leaderboard converter by JoelNiklaus · Pull Request #160 · evaleval/every_eval_ever

JoelNiklaus · 2026-06-08T07:58:04Z

Add LEXam public leaderboard converter

Problem

Every Eval Ever supports converting local eval logs and some public leaderboards (e.g. AlpacaEval), but not LEXam - a legal reasoning benchmark with a public leaderboard and Hugging Face dataset.

Solution

Add a lexam converter that:

Fetches the public leaderboard HTML from the LEXam project website repo
Parses open-question judge scores and MCQ accuracy for each model
Emits schema-valid Every Eval Ever JSON with source_metadata.source_type = documentation
Links underlying benchmark data to LEXam-Benchmark/LEXam on Hugging Face
Records LLM-judge metadata for open-question scoring
Writes output under the stable data/lexam/{developer}/{model} layout
Uses explicit model identity mappings and fails loudly on newly unmapped leaderboard entries

CLI usage:

uv run every_eval_ever convert lexam --output_dir data

Testing

Added unit tests with a frozen HTML fixture covering parsing, medal stripping, metric combination, schema validation, missing-section errors, and unknown model mappings
Ran live conversion against the current LEXam leaderboard (36 models)
Validated all generated JSON files with every_eval_ever validate
uv run pytest tests/test_lexam_adapter.py -v
uv run ruff check on changed files

Fetch the LEXam website leaderboard and convert model scores into the Every Eval Ever schema, including open-question judge scores and MCQ accuracy with Hugging Face dataset metadata. Co-authored-by: Cursor <cursoragent@cursor.com>

JoelNiklaus · 2026-06-08T08:01:53Z

Not sure if you want the references to the eval converter in the README or only the general evaluation framework specific ones?

Use explicit LEXam model identity mappings, keep output under the stable lexam benchmark folder, and make relationship overrides consistent with the default source metadata. Co-authored-by: Cursor <cursoragent@cursor.com>

Add LEXam public leaderboard converter

716fc22

Fetch the LEXam website leaderboard and convert model scores into the Every Eval Ever schema, including open-question judge scores and MCQ accuracy with Hugging Face dataset metadata. Co-authored-by: Cursor <cursoragent@cursor.com>

Tighten LEXam converter metadata

a6451ef

Use explicit LEXam model identity mappings, keep output under the stable lexam benchmark folder, and make relationship overrides consistent with the default source metadata. Co-authored-by: Cursor <cursoragent@cursor.com>

evijit requested review from borgr, janbatzner and nelaturuharsha June 12, 2026 04:35

DeepLumiere reviewed Jun 13, 2026

View reviewed changes

Comment thread tests/data/lexam/leaderboard.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LEXam public leaderboard converter#160

Add LEXam public leaderboard converter#160
JoelNiklaus wants to merge 2 commits into
evaleval:mainfrom
JoelNiklaus:feature/lexam-converter

JoelNiklaus commented Jun 8, 2026 •

edited

Loading

Uh oh!

JoelNiklaus commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JoelNiklaus commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add LEXam public leaderboard converter

Problem

Solution

Testing

Uh oh!

JoelNiklaus commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JoelNiklaus commented Jun 8, 2026 •

edited

Loading