Skip to content

Conversation

@heyjustinai
Copy link
Member

What does this PR do?

Introduces Prompt Duel Optimizer (PDO) - a label-free prompt optimization strategy based on our research paper (arXiv:2510.13907). PDO uses dueling bandits and Thompson sampling to optimize prompts without requiring labeled validation data.

Key Features:

  • Label-free optimization using LLM judge for pairwise comparisons
  • Double Thompson Sampling for efficient prompt selection
  • Top-performer guided mutation for prompt evolution
  • Outperforms baselines on BIG-bench Hard and MS MARCO

What's Included:

  • Core PDO implementation in src/prompt_ops/core/pdo/
  • Example config: configs/pdo-example.yaml
  • Use case: use-cases/web-of-lies-pdo/ (logical reasoning benchmark)
  • Updated README and documentation

Test Plan

cd use-cases/web-of-lies-pdo
prompt-ops migrate --config config.yaml

Full test: Run with default 30 rounds for complete benchmark results.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 23, 2025
@heyjustinai heyjustinai merged commit b937782 into main Oct 23, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants