readme update for running multiple eval runs

Katherine Huang · Katherine Huang · commit 7ffb912f92ef · 2025-09-03T15:15:23.000-07:00
diff --git a/README.md b/README.md
@@ -794,10 +794,22 @@ To run evaluation on a another image, a new json file should be created using th
 } 
 ```
 In `config-eval.yml`, the following should be configured:
-- `file_path`: Evaluation dataset specifying image metadata and workflow with CVEs
+- `file_path`: Evaluation dataset specifying image and workflow (test set of CVEs)
 - `kwargs: workflow_id`: The specific workflow to run
 - `output_dir`: Evaluation output file path
 
+### Running Multiple Evaluations
+Use the `--reps` flag to run evaluation multiple times on the same image and workflow.
+```
+nat eval --config_file=configs/config-eval.yml --reps=3
+```
+**WIP / Known issues**
+- nginx llm_cache needs to be disabled for runs to have different results. I was commenting out lines in `nginx_cache.conf` and `nginx/templates/routes/*` but I need to find a better way to toggle it off for eval; I think I can make an eval .env file to override the model provider base URLs in docker-compose so they bypass nginx.
+- when I disable llm_cache, I get a bunch of "Too many requests" errors even when using nvdev endpoint
+
+
+The output file will show individual accuracy scores for each run, and the average accuracy across all runs.
+
 ## Troubleshooting
 
 Several common issues can arise when running the workflow. Here are some common issues and their solutions.