Describe the bug
The olympiad_bench task encountered schema validation issues during dataset generation/saving because the specific field in the Doc object was being assigned an empty dictionary {}. This caused a mismatch with the expected Arrow/Parquet schema in some environments.
To Reproduce
task = "olympiad_bench|0"
pipeline = Pipeline(
tasks=task,
pipeline_parameters=pipeline_params,
evaluation_tracker=evaluation_tracker,
model_config=model_config,
)
pipeline.evaluate()
pipeline.save_and_push_results()
pipeline.show_results()
ArrowNotImplementedError Traceback (most recent call last)
Cell In[3], line 37
29 pipeline = Pipeline(
30 tasks=task,
31 pipeline_parameters=pipeline_params,
32 evaluation_tracker=evaluation_tracker,
33 model_config=model_config,
34 )
36 pipeline.evaluate()
---> 37 pipeline.save_and_push_results()
38 pipeline.show_results()
File ~/Code/Evalhub/backend/lighteval/src/lighteval/pipeline.py:429, in Pipeline.save_and_push_results(self)
427 logger.info("--- SAVING AND PUSHING RESULTS ---")
428 if self.is_main_process():
--> 429 self.evaluation_tracker.save()
File ~/Code/Evalhub/backend/lighteval/src/lighteval/logging/evaluation_tracker.py:273, in EvaluationTracker.save(self)
270 self.save_results(date_id, results_dict)
272 if self.should_save_details:
--> 273 self.save_details(date_id, details_datasets)
275 if self.should_push_to_hub:
276 self.push_to_hub(
277 date_id=date_id,
...
File ~/Code/Evalhub/backend/.venv/lib/python3.13/site-packages/pyarrow/error.pxi:155, in pyarrow.lib.pyarrow_internal_check_status()
File ~/Code/Evalhub/backend/.venv/lib/python3.13/site-packages/pyarrow/error.pxi:92, in pyarrow.lib.check_status()
ArrowNotImplementedError: Cannot write struct type 'specific' with no child field to Parquet. Consider adding a dummy child field.
Expected behavior
The specific field should be None (or not passed) when no additional metadata is provided, ensuring compatibility with the expected schema.
Version info
- OS: mac
- Lighteval version: main (local development)
Describe the bug
The
olympiad_benchtask encountered schema validation issues during dataset generation/saving because thespecificfield in theDocobject was being assigned an empty dictionary{}. This caused a mismatch with the expected Arrow/Parquet schema in some environments.To Reproduce
Expected behavior
The
specificfield should beNone(or not passed) when no additional metadata is provided, ensuring compatibility with the expected schema.Version info