Skip to content

Evaluation Dataset Missing, Training Instability, and Unclear Hyperparameter Settings #239

@WinterPenguin2

Description

@WinterPenguin2

Hello,

I encountered several issues while trying to reproduce the results:

  1. Evaluation Dataset
    The dataset required for evaluation is not provided.
    The evaluation path is hard-coded to the author’s local desktop, making evaluation impossible after training.
    Could you please provide the dataset or a valid path?

  2. Training Instability

When running on the experiments, the training loss consistently drops to 0.
In GRPO, the Advantage term always becomes 0 (huggingface/open-r1#239 (comment)), but training is still possible since the KL divergence term is non-zero.

However, in Visual RFT, the KL divergence coefficient is set to 0, which makes the training dynamics unclear.
I am unsure if this is intended or a mistake.

Could you please share the missing hyperparameters or supplementary material that includes detailed training logs? This would help in reproducing and validating the reported results.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions