Skip to content

Launch training jobs #6

@FlorianBertonBrightClue

Description

it seems that there is a issue when you launch for the first time a training jobs.

In base_training_job.py line 203 you check if the checkpoint subfolder exists and if not you create it. However this directory is a child of log_folder/training_job_name

Then line 217 you check if the log folder : log_folder/training_job_name exists in order to know if the training should init it and the parameters or used a checkpoints.

The issue is that this folder is sure to exists as you just created it before line 203. At this point the boolean __found_job_folder is True. This means that a file ".yml" should be present which is not the case.

And so when we go in __initialize_training_job() instead of saving the parameters we try to load it (line 747),
and then an error is raised in __load_training_parameters()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions