add to readme how to run multiple reps of eval

Katherine Huang · Katherine Huang · commit a3a117f70f66 · 2025-09-03T15:15:23.000-07:00
diff --git a/README.md b/README.md
@@ -799,16 +799,37 @@ In `config-eval.yml`, the following should be configured:
 - `output_dir`: Evaluation output file path
 
 ### Running Multiple Evaluations
-Use the `--reps` flag to run evaluation multiple times on the same image and workflow.
+
+Use the `--reps` flag to run evaluation multiple times on the same image and workflow. Note that **caching must be disabled first** (see [Disable Caching](#disable-caching)).
+
 ```
 nat eval --config_file=configs/config-eval.yml --reps=3
 ```
-**WIP / Known issues**
-- nginx llm_cache needs to be disabled for runs to have different results. I was commenting out lines in `nginx_cache.conf` and `nginx/templates/routes/*` but I need to find a better way to toggle it off for eval; I think I can make an eval .env file to override the model provider base URLs in docker-compose so they bypass nginx.
-- when I disable llm_cache, I get a bunch of "Too many requests" errors even when using nvdev endpoint
 
+The output file will show individual accuracy scores for each run, as well as the average accuracy across all runs.
+
+#### Disable Caching
+LLM and embedder caching must be disabled for multiple evaluation runs, to prevent getting the same results every time.
+
+##### Option 1 - Global Override
+Globally override all LLM and embedder caching by setting the `NVIDIA_API_BASE` variable to directly hit the model endpoint instead of going through nginx caching.
+
+```
+NVIDIA_API_BASE=https://integrate.api.nvidia.com/v1 nat eval --config_file=configs/config-eval.yml --reps=3
+```
+
+##### Option 2 - Disable Caching on Specific Calls
+Users may want to disable caching for only specific parts of the pipeline. For example, more in-depth experiments may require disabling caching on specific LLMs. Or, you may want to disable all LLM call caching but keeping embedder caching so the embedder does not have to regenerate embeddings for the static image source every evaluation run.
+
+You can accomplish specific caching overrides in `config-eval.yml` by assigning the `base_url` variable of the target component with `"https://integrate.api.nvidia.com/v1"`. Note that you can create multiple config files to keep track of different evaluation experiments.
+
+
+For quick testing, you can also change configuration at runtime using the `--override` flag, which uses dot notation. Below is an example that disables caching on the checklist_llm.
+
+```
+nat eval --config_file=configs/config-eval.yml --reps=3 --override llms.checklist_llm.base_url https://integrate.api.nvidia.com/v1
+```
 
-The output file will show individual accuracy scores for each run, and the average accuracy across all runs.
 
 ## Troubleshooting