Skip to content

Commit 7ffb912

Browse files
author
Katherine Huang
committed
readme update for running multiple eval runs
1 parent 3c66a49 commit 7ffb912

File tree

1 file changed

+13
-1
lines changed

1 file changed

+13
-1
lines changed

README.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -794,10 +794,22 @@ To run evaluation on a another image, a new json file should be created using th
794794
}
795795
```
796796
In `config-eval.yml`, the following should be configured:
797-
- `file_path`: Evaluation dataset specifying image metadata and workflow with CVEs
797+
- `file_path`: Evaluation dataset specifying image and workflow (test set of CVEs)
798798
- `kwargs: workflow_id`: The specific workflow to run
799799
- `output_dir`: Evaluation output file path
800800
801+
### Running Multiple Evaluations
802+
Use the `--reps` flag to run evaluation multiple times on the same image and workflow.
803+
```
804+
nat eval --config_file=configs/config-eval.yml --reps=3
805+
```
806+
**WIP / Known issues**
807+
- nginx llm_cache needs to be disabled for runs to have different results. I was commenting out lines in `nginx_cache.conf` and `nginx/templates/routes/*` but I need to find a better way to toggle it off for eval; I think I can make an eval .env file to override the model provider base URLs in docker-compose so they bypass nginx.
808+
- when I disable llm_cache, I get a bunch of "Too many requests" errors even when using nvdev endpoint
809+
810+
811+
The output file will show individual accuracy scores for each run, and the average accuracy across all runs.
812+
801813
## Troubleshooting
802814
803815
Several common issues can arise when running the workflow. Here are some common issues and their solutions.

0 commit comments

Comments
 (0)