Evaluation Dataset Missing, Training Instability, and Unclear Hyperparameter Settings

Hello,

I encountered several issues while trying to reproduce the results:

1. Evaluation Dataset
The dataset required for evaluation is not provided.
The evaluation path is hard-coded to the author’s local desktop, making evaluation impossible after training.
Could you please provide the dataset or a valid path? 

2. Training Instability

When running on the experiments, the training loss consistently drops to 0.
In GRPO, the Advantage term always becomes 0 (https://github.com/huggingface/open-r1/issues/239#issuecomment-2646297851), but training is still possible since the KL divergence term is non-zero.

However, in Visual RFT, the KL divergence coefficient is set to 0, which makes the training dynamics unclear.
I am unsure if this is intended or a mistake.

Could you please share the missing hyperparameters or supplementary material that includes detailed training logs? This would help in reproducing and validating the reported results.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Evaluation Dataset Missing, Training Instability, and Unclear Hyperparameter Settings #239

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Evaluation Dataset Missing, Training Instability, and Unclear Hyperparameter Settings #239

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions