You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+26-5Lines changed: 26 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -799,16 +799,37 @@ In `config-eval.yml`, the following should be configured:
799
799
- `output_dir`: Evaluation output file path
800
800
801
801
### Running Multiple Evaluations
802
-
Use the `--reps` flag to run evaluation multiple times on the same image and workflow.
802
+
803
+
Use the `--reps` flag to run evaluation multiple times on the same image and workflow. Note that **caching must be disabled first** (see [Disable Caching](#disable-caching)).
- nginx llm_cache needs to be disabled for runs to have different results. I was commenting out lines in `nginx_cache.conf` and `nginx/templates/routes/*` but I need to find a better way to toggle it off for eval; I think I can make an eval .env file to override the model provider base URLs in docker-compose so they bypass nginx.
808
-
- when I disable llm_cache, I get a bunch of "Too many requests" errors even when using nvdev endpoint
809
808
809
+
The output file will show individual accuracy scores for each run, as well as the average accuracy across all runs.
810
+
811
+
#### Disable Caching
812
+
LLM and embedder caching must be disabled for multiple evaluation runs, to prevent getting the same results every time.
813
+
814
+
##### Option 1 - Global Override
815
+
Globally override all LLM and embedder caching by setting the `NVIDIA_API_BASE` variable to directly hit the model endpoint instead of going through nginx caching.
##### Option 2 - Disable Caching on Specific Calls
822
+
Users may want to disable caching for only specific parts of the pipeline. For example, more in-depth experiments may require disabling caching on specific LLMs. Or, you may want to disable all LLM call caching but keeping embedder caching so the embedder does not have to regenerate embeddings for the static image source every evaluation run.
823
+
824
+
You can accomplish specific caching overrides in `config-eval.yml` by assigning the `base_url` variable of the target component with `"https://integrate.api.nvidia.com/v1"`. Note that you can create multiple config files to keep track of different evaluation experiments.
825
+
826
+
827
+
For quick testing, you can also change configuration at runtime using the `--override` flag, which uses dot notation. Below is an example that disables caching on the checklist_llm.
0 commit comments