Skip to content

Commit 80588e0

Browse files
john-b-yangklieret
authored andcommitted
Update README
1 parent 5750318 commit 80588e0

7 files changed

+22
-9
lines changed

β€ŽREADME.mdβ€Ž

Lines changed: 22 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -20,20 +20,26 @@
2020

2121
## πŸ‘‹ Overview
2222

23-
CodeClash is a benchmark for evaluating AI systems on goal-oriented software engineering.
23+
CodeClash is a benchmark for evaluating AI systems on **goal-oriented software engineering**.
2424

25-
Today's AI coding evaluations are largely *task*-oriented (e.g., HumanEval, SWE-bench).
26-
Models are given explicit instructions and evaluated on their ability to write correct implementations.
25+
Today's AI coding evals are *task*-oriented (e.g., HumanEval, SWE-bench).
26+
Models are given explicit instructions.
27+
We then verify implementations with unit tests.
2728

28-
But software is fundamentally driven by goals ("improve user retention", "reduce costs", "increase revenue").
29-
To enable *goal*-oriented SWE evaluation of Language Models (LMs) as SWE-agents, we introduce CodeClash!
29+
But building software is fundamentally driven by goals ("improve user retention", "reduce costs", "increase revenue").
30+
Reaching our goals is a self-directed, iterative, and often competitive process.
31+
To capture this dynamism of real software development, we introduce CodeClash!
32+
33+
Check out our [arXiv paper](https://arxiv.org/abs/2511.00839) and [website](https://codeclash.ai/) for the full details!
34+
35+
## βš”οΈ How It Works
3036

3137
<p align="center">
3238
<img src="docs/assets/flowchart.jpg" style="width: 70%" />
3339
</p>
3440

35-
In CodeClash, 2+ LM agents compete in a code arena.
36-
Across a multi-round tournament, agents iteratively improve a codebase to win a high level objective (e.g., accumulate resources, survive the longest, etc).
41+
In CodeClash, 2+ LM agents compete in a **code arena** over the course of a multi-round tournament.
42+
For the duration of the tournament, each agent is iteratively improving their own codebase to win a high-level, competitive objective (e.g., accumulate resources, survive the longest, etc).
3743
Each round consists of two phases:
3844

3945
* Edit phase: LM agents make whatever changes they want to their codebase.
@@ -54,11 +60,18 @@ $ python main.py configs/test/battlesnake.yaml
5460
```
5561

5662
Once this works, you should be set up to run a real tournament!
57-
To pit Claude Sonnet 4.5 against o3 in the BattleSnake arena, run:
63+
To run *Claude Sonnet 4.5* against *o3* in a *BattleSnake* tournament with *5 rounds* and *1000 competition simulations* per round, run:
5864
```bash
59-
$ python main.py configs/examples/
65+
$ python main.py configs/examples/BattleSnake__claude-sonnet-4-5-20250929__o3__r5__s1000.yaml
6066
```
6167

68+
And that's it, you're good to go!
69+
70+
Where to next?
71+
- Check out our [docs](https://codeclash.ai/docs/) for more details on running different arenas, configuring tournaments, etc.
72+
- Explore our [contribution guide](CONTRIBUTING.md) to see what we're excited about!
73+
- Have a big idea? Let's hear it! Open an issue, and let's turn it into an [insight](https://codeclash.ai/insights/)!
74+
6275
## πŸ’« Contributions
6376
We're actively working on several follow ups!
6477
Check out the [Contributing Guide](CONTRIBUTING.md) for more.

configs/examples/BattleSnake__claude-sonnet-4-5-20250929__o3__r15__s1000.yaml renamed to configs/examples/BattleSnake__claude-sonnet-4-5-20250929__o3__r5__s1000.yaml

File renamed without changes.

configs/examples/CoreWar__claude-sonnet-4-5-20250929__o3__r15__s1000.yaml renamed to configs/examples/CoreWar__claude-sonnet-4-5-20250929__o3__r5__s1000.yaml

File renamed without changes.
File renamed without changes.

configs/examples/HuskyBench__claude-sonnet-4-5-20250929__o3__r15__s100.yaml renamed to configs/examples/HuskyBench__claude-sonnet-4-5-20250929__o3__r5__s100.yaml

File renamed without changes.

configs/examples/RoboCode__claude-sonnet-4-5-20250929__o3__r15__s250.yaml renamed to configs/examples/RoboCode__claude-sonnet-4-5-20250929__o3__r5__s250.yaml

File renamed without changes.

configs/examples/RobotRumble__claude-sonnet-4-5-20250929__o3__r15__s250.yaml renamed to configs/examples/RobotRumble__claude-sonnet-4-5-20250929__o3__r5__s250.yaml

File renamed without changes.

0 commit comments

Comments
Β (0)