Replies: 3 comments
-
|
You need make sure to return a “combined_score” in your evaluator which will be used. Otherwise it uses an average of all the metrics returned as the fitness score. There may be condition in the evaluator where it is not returning a combined score field. |
Beta Was this translation helpful? Give feedback.
-
|
Don't all paths in my code return a “combined_score” ? I must be missing something. |
Beta Was this translation helpful? Give feedback.
-
|
I think the problem is the timeout from openevolve itself. This sets combined_score to 0.5 and because I am maximizing a negative score the value of 0.5 openevolve gives is larger than any value the evolved code can achieve. I can get round it by setting the timeout in evaluate to be less than the timeout in config.yaml. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
After running a few minutes I see:
2025-11-24 16:09:43,290 - INFO - Sampled model: openai/gpt-oss-120b⚠️ No 'combined_score' metric found in evaluation results. Using average of all numeric metrics (0.5000) for evolution guidance. For better evolution results, please modify your evaluator to return a 'combined_score' metric that properly weights different aspects of program performance.
2025-11-24 16:09:43,294 - WARNING - Iteration 82 error: Generated code exceeds maximum length (21825 > 20000)
2025-11-24 16:09:43,637 - WARNING - Evaluation timed out after 120s
2025-11-24 16:09:43,646 - INFO - Sampled model: openai/gpt-oss-120b
2025-11-24 16:09:43,651 - INFO - New MAP-Elites cell occupied in island 1: {'complexity': 7, 'diversity': 6}
2025-11-24 16:09:43,651 - INFO - Population size (71) exceeds limit (70), removing 1 programs
2025-11-24 16:09:43,652 - INFO - Population size after cleanup: 70
2025-11-24 16:09:43,652 - INFO - New best program da964b6f-ef89-4656-98d5-5a538784ee41 replaces c40678cd-cdb8-4990-a522-af194ee8b106
2025-11-24 16:09:43,652 - INFO - Iteration 73: Program da964b6f-ef89-4656-98d5-5a538784ee41 (parent: 21bc16c1-d859-4db0-a716-d4dd346d22bf) completed in 134.98s
2025-11-24 16:09:43,652 - INFO - Metrics: error=0.0000, timeout=1.0000
2025-11-24 16:09:43,652 - WARNING -
2025-11-24 16:09:43,652 - INFO - 🌟 New best solution found at iteration 73: da964b6f-ef89-4656-98d5-5a538784ee41
2025-11-24 16:09:43,653 - INFO - Checkpoint interval reached at iteration 73
2025-11-24 16:09:43,663 - INFO - Island Status:
2025-11-24 16:09:43,663 - INFO - Island 0: 21 programs, best=-58.0000, avg=-3398.4286, diversity=1570.22, gen=16 (best: 7ec4faea-9b2b-4d5c-ba10-bd41ff5ee922)
2025-11-24 16:09:43,664 - INFO - Island 1: 13 programs, best=0.5000, avg=-301.4231, diversity=746.05, gen=16 (best: da964b6f-ef89-4656-98d5-5a538784ee41)
My evaluator.py looks like:
How does it ever return a value of 0.5? It seems this might be the average of 0 and 1 but I am not sure how to stop it happening.
Beta Was this translation helpful? Give feedback.
All reactions