fix: raise RuntimeError when failed to generate an answer #94

yhrkw · 2025-08-26T03:35:26Z

In several models, they fail during generation and raise RuntimeError. This appears to occur with both pre-trained models and API-based models.

We will include generation success/failure status to trial_*.json while excluding it from score calculations.

yhrkw · 2025-08-26T03:43:15Z

pfgen_eval.py

            json.dump(data, f, indent=2, ensure_ascii=False)

-        result["num_trials"] = min([len(x) for x in data.values()])
+        result["num_trials"] = max([len(x) for x in data.values()])


Putting num_trials in config.json seems appropriate.

fix: raise RuntimeError when failed to generate an answer

7fd851e

yhrkw commented Aug 26, 2025

View reviewed changes

feat: add --num-retries and --ignore-failure options

4ef5798

imos approved these changes Sep 2, 2025

View reviewed changes

fix: post process and lint error

6fb652e

yhrkw force-pushed the unraise-not-answered branch from 7ac75fd to 6fb652e Compare September 4, 2025 08:16

yhrkw requested a review from imos September 4, 2025 08:18

imos approved these changes Sep 4, 2025

View reviewed changes

imos merged commit c3bd1cb into pfnet-research:main Sep 4, 2025
1 check passed

yhrkw deleted the unraise-not-answered branch September 17, 2025 00:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: raise RuntimeError when failed to generate an answer #94

fix: raise RuntimeError when failed to generate an answer #94

Uh oh!

yhrkw commented Aug 26, 2025

Uh oh!

yhrkw Aug 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: raise RuntimeError when failed to generate an answer #94

fix: raise RuntimeError when failed to generate an answer #94

Uh oh!

Conversation

yhrkw commented Aug 26, 2025

Uh oh!

yhrkw Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants