Skip to content

Commit a3a117f

Browse files
author
Katherine Huang
committed
add to readme how to run multiple reps of eval
1 parent 7ffb912 commit a3a117f

File tree

1 file changed

+26
-5
lines changed

1 file changed

+26
-5
lines changed

README.md

Lines changed: 26 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -799,16 +799,37 @@ In `config-eval.yml`, the following should be configured:
799799
- `output_dir`: Evaluation output file path
800800
801801
### Running Multiple Evaluations
802-
Use the `--reps` flag to run evaluation multiple times on the same image and workflow.
802+
803+
Use the `--reps` flag to run evaluation multiple times on the same image and workflow. Note that **caching must be disabled first** (see [Disable Caching](#disable-caching)).
804+
803805
```
804806
nat eval --config_file=configs/config-eval.yml --reps=3
805807
```
806-
**WIP / Known issues**
807-
- nginx llm_cache needs to be disabled for runs to have different results. I was commenting out lines in `nginx_cache.conf` and `nginx/templates/routes/*` but I need to find a better way to toggle it off for eval; I think I can make an eval .env file to override the model provider base URLs in docker-compose so they bypass nginx.
808-
- when I disable llm_cache, I get a bunch of "Too many requests" errors even when using nvdev endpoint
809808
809+
The output file will show individual accuracy scores for each run, as well as the average accuracy across all runs.
810+
811+
#### Disable Caching
812+
LLM and embedder caching must be disabled for multiple evaluation runs, to prevent getting the same results every time.
813+
814+
##### Option 1 - Global Override
815+
Globally override all LLM and embedder caching by setting the `NVIDIA_API_BASE` variable to directly hit the model endpoint instead of going through nginx caching.
816+
817+
```
818+
NVIDIA_API_BASE=https://integrate.api.nvidia.com/v1 nat eval --config_file=configs/config-eval.yml --reps=3
819+
```
820+
821+
##### Option 2 - Disable Caching on Specific Calls
822+
Users may want to disable caching for only specific parts of the pipeline. For example, more in-depth experiments may require disabling caching on specific LLMs. Or, you may want to disable all LLM call caching but keeping embedder caching so the embedder does not have to regenerate embeddings for the static image source every evaluation run.
823+
824+
You can accomplish specific caching overrides in `config-eval.yml` by assigning the `base_url` variable of the target component with `"https://integrate.api.nvidia.com/v1"`. Note that you can create multiple config files to keep track of different evaluation experiments.
825+
826+
827+
For quick testing, you can also change configuration at runtime using the `--override` flag, which uses dot notation. Below is an example that disables caching on the checklist_llm.
828+
829+
```
830+
nat eval --config_file=configs/config-eval.yml --reps=3 --override llms.checklist_llm.base_url https://integrate.api.nvidia.com/v1
831+
```
810832
811-
The output file will show individual accuracy scores for each run, and the average accuracy across all runs.
812833
813834
## Troubleshooting
814835

0 commit comments

Comments
 (0)