[stale] [poc] Add sdk config caching #2909

jp-agenta · 2025-11-11T17:34:03Z

[POC] Feature/sdk config caching

CLAassistant · 2025-11-11T17:34:11Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

GitHub CI seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Copilot

Pull Request Overview

This PR implements a proof-of-concept for SDK config caching and refactors evaluation-related models and services. The changes primarily focus on renaming evaluation entities from "result/metrics/queue" to "step/metric", removing legacy applications functionality, simplifying authentication logic, and cleaning up migration files.

Key changes:

Renamed evaluation entities: EvaluationResult → EvaluationStep, EvaluationMetrics → EvaluationMetric, removed EvaluationQueue
Removed legacy applications router, models, and utils
Simplified blocked email/domain checking by removing PostHog feature flag integration
Removed numerous database migration files and test files
Updated service imports and refactored EE-specific database managers

Reviewed Changes

Copilot reviewed 116 out of 2336 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
api/oss/src/apis/fastapi/evaluations/models.py	Renamed evaluation entities and simplified query models
api/oss/src/apis/fastapi/applications/*	Removed legacy applications functionality
api/oss/src/apis/fastapi/annotations/*	Added annotation utilities and expanded router with new endpoints
api/oss/src/init.py	Simplified email/domain blocking logic, removed PostHog integration
api/oss/docker/Dockerfile.*	Cleaned up Docker build files
api/oss/databases/postgres/migrations/utils.py	Changed async database functions to synchronous
api/oss/databases/postgres/migrations/core/versions/*	Removed multiple migration files
api/entrypoint.py	Reorganized imports and router initialization
api/ee/tests/manual/evaluations/sdk/*	Removed SDK test files
api/ee/src/services/*	Updated service imports and parameter names

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-11T17:35:31Z

api/oss/src/apis/fastapi/evaluations/models.py

    runs: List[EvaluationRunEdit]


 class EvaluationRunQueryRequest(BaseModel):


[nitpick] The run field is now required (not Optional), but include_archived defaults to False. Consider if include_archived should also be required for consistency, or document why run must always be provided while include_archived has a default.

Suggested change

class EvaluationRunQueryRequest(BaseModel):

class EvaluationRunQueryRequest(BaseModel):

"""

Request model for querying evaluation runs.

- `run` is required because it specifies the query parameters for the evaluation run.

- `include_archived` is optional and defaults to False, so archived runs are excluded unless explicitly requested.

"""

Copilot · 2025-11-11T17:35:31Z

api/oss/databases/postgres/migrations/tracing/versions/fd77265d65dc_fix_spans.py

+    span_type_enum = sa.Enum(SpanType, name="tracetype")
+    trace_type_enum = sa.Enum(TraceType, name="spantype")


The enum names are swapped: span_type_enum has name="tracetype" and trace_type_enum has name="spantype". This should be corrected to span_type_enum with name="spantype" and trace_type_enum with name="tracetype".

Suggested change

span_type_enum = sa.Enum(SpanType, name="tracetype")

trace_type_enum = sa.Enum(TraceType, name="spantype")

span_type_enum = sa.Enum(SpanType, name="spantype")

trace_type_enum = sa.Enum(TraceType, name="tracetype")

Copilot · 2025-11-11T17:35:32Z

api/oss/databases/postgres/migrations/utils.py

    or env.POSTGRES_URI_CORE
    or env.POSTGRES_URI_TRACING
-    or "postgresql+asyncpg://username:password@localhost:5432/agenta_oss"
+    or "postgresql://username:password@localhost:5432/agenta_oss"


[nitpick] The default connection string was changed from postgresql+asyncpg://... to postgresql://..., but this should match the synchronous nature of the refactored functions. Verify that all callers can handle this synchronous connection.

Copilot · 2025-11-11T17:35:32Z

api/ee/src/services/organization_service.py

-            "Click the link below to accept the invitation:</p><br>"
-            f'<a href="{invite_link}">Accept Invitation</a>'
-        ),
+        call_to_action=f'Click the link below to accept the invitation:</p><br><a href="{env.AGENTA_WEB_URL}/auth?token={token}&email={email}&org_id={organization.id}&workspace_id={workspace.id}&project_id={project_id}">Accept Invitation</a>',


The URL parameters are no longer URL-encoded (removed quote()). This could lead to issues if any of the values contain special characters like &, =, or spaces. Consider re-adding URL encoding via urllib.parse.quote() for each parameter value.

Copilot · 2025-11-11T17:35:32Z

api/ee/src/services/workspace_manager.py

-                status_code=409,
-                detail="User is already a member of the workspace",
-            )
+            raise Exception("User is already a member of the workspace")


Changed from HTTPException(status_code=409, ...) to a generic Exception. This breaks the API contract by not returning a proper HTTP 409 Conflict status. Should use HTTPException for consistency with FastAPI error handling.

Copilot · 2025-11-11T17:35:32Z

api/ee/src/services/commoners.py

        create_org_payload = CreateOrganization(
            name=user_dict["username"],
-            description="Default Organization",
+            description="My Default Organization",


[nitpick] Changed from "Default Organization" to "My Default Organization". This string appears to be user-facing and the change may not be intentional. Verify this change aligns with the product requirements.

Suggested change

description="My Default Organization",

description="Default Organization",

web/ee/src/lib/hooks/useEvaluationRunData/index.ts

+        const runRes = await axios.get(
+            `/preview/evaluations/runs/${evaluationTableId}?project_id=${projectId}`,
+        )


General approach:
Validate or sanitize the user-controlled value used as evaluationTableId before interpolating it into the axios HTTP GET path, ensuring it cannot contain dangerous path traversal or special characters, and (ideally) restricting it to a set of allowed patterns (e.g., known IDs, or at least only alphanumeric + dashes/underscores).

Best fix:

In useEvaluationRunData, before using evaluationTableId to construct a request URL, ensure it is validated.

Add a validation function (e.g., isValidId) that only allows IDs matching a safe pattern (such as /^[a-zA-Z0-9_-]+$/). This should be enforced before making the request, and if invalid, log and return early.

Add this function in web/ee/src/lib/hooks/useEvaluationRunData/index.ts, and apply it in the code path for preview run where axios.get is called.

Optionally, apply the same pattern to legacy fetch hooks.

Required changes:

Add validation function (isValidId).

Check evaluationTableId using isValidId before sending the axios request.

If invalid, log error (or optionally throw, or safely return null/reject).

web/ee/src/services/evaluations/api/index.ts

+export const fetchEvaluation = async (evaluationId: string) => {
+    const {projectId} = getCurrentProject()
+
+    const response = await axios.get(`/evaluations/${evaluationId}?project_id=${projectId}`)


To remediate SSRF, we must validate and restrict the user-controlled evaluationId before using it as a path parameter in an outgoing HTTP request.
The most robust pattern is to accept only a fixed string format (such as UUID ^[a-fA-F0-9\-]{36}$), and reject any evaluationId that does not match.
Therefore, in fetchEvaluation, add a check before calling axios:

If evaluationId does not match UUID regex, throw an error or return (optionally, sanitize further at usage).

This pattern should be repeated for any other API consumption that interpolates evaluationId.

Required changes:

In web/ee/src/services/evaluations/api/index.ts, modify the fetchEvaluation and fetchAllEvaluationScenarios functions so that they validate evaluationId against strict UUID format before interpolation.

Add a helper function for UUID validation (to be used for this and any similar interpolation).

Optionally, perform similar restrictions in other places where evaluationId is used as a resource identifier (as seen, e.g., in fetchEvaluationStatus or multi-ID usages), but the highest risk is in string interpolation as a path parameter.

web/ee/src/services/evaluations/api/index.ts

+    const {projectId} = getCurrentProject()
+
+    const [{data: evaluationScenarios}, evaluation] = await Promise.all([
+        axios.get(`/evaluations/${evaluationId}/evaluation_scenarios?project_id=${projectId}`),


To fix the problem, we need to ensure that only valid (expected) evaluation IDs make their way into the axios request URL. This means validating evaluationId before using it to construct the API path, mitigating the risk that a malicious/injected value could result in an unexpected network request. Since all frontend code is shown, the best place to do this is to check that the evaluationId matches the expected format. If IDs are simple UUIDs or hex strings, we can enforce such a pattern.

Specifically:

In fetchAllEvaluationScenarios, before using evaluationId in the URL, check it against a regular expression (e.g., UUID or hex string pattern).

In fetchAllComparisonResults, filter the evaluationIds array to retain only those that are valid.

If any invalid ID is encountered, the function should throw or return an error, or ignore such IDs.

For maximum safety and ease, provide a helper (e.g., isValidId) and use it wherever IDs are passed to these API calls.

We do NOT change core logic—just ensure no invalid ID gets sent to the vulnerable endpoint.

web/ee/src/services/evaluations/workerUtils.ts

+        const res = await fetch(
+            `${apiUrl}/preview/evaluations/scenarios/${scenarioId}?project_id=${projectId}`,
+            {
+                headers: {Authorization: `Bearer ${jwt}`},
+            },
+        )


To fix SSRF risk here, we need to ensure that the scenarioId used in constructing outgoing request URLs is properly validated before it enters the sensitive path segment.
The best way in this context is to strictly validate scenarioId against an expected format or an allowlist. If your scenario IDs are UUIDs, validate that the input is indeed a valid UUID before using it in URLs. If not, validate against a whitelist or regular expression so that path traversal (../) or similar attacks are not possible.

Where to change:
Edit web/ee/src/services/evaluations/workerUtils.ts at the start of updateScenarioStatusRemote to validate scenarioId.

If not valid, throw early and do not proceed.

If using UUIDs, use a standard regex or a library for validation.

Add a UUID validation helper (or regex) at the top of the file.
For maximum safety, use a well-known existing npm library like validator's isUUID method if feasible.

Required changes:

Add the helper or import for UUID validation.

At the start of updateScenarioStatusRemote, validate scenarioId. Throw an error or return early if not valid.

Optionally, repeat similar checks for other sensitive path values (projectId if relevant, though the main SSRF vector is hosts/path segments).

This fix can be extended to other locations in this file where similar interpolation occurs, but CodeQL has highlighted scenarioId specifically.

web/ee/src/services/human-evaluations/api/index.ts

+        return await axios
+            .get(`${getAgentaApiUrl()}/human-evaluations/${evaluationId}?project_id=${projectId}`)


web/ee/src/services/human-evaluations/api/index.ts

+                return fromEvaluationResponseToEvaluation(responseData.data)
+            })
+    } catch (error) {
+        console.error(`Error fetching evaluation ${evaluationId}:`, error)


To fix the vulnerability, do not interpolate (template) the untrusted evaluationId into the format string argument of console.error. Instead, use a literal string format with placeholders, ensuring the untrusted value becomes a regular value parameter rather than part of the format string. For example, change:

console.error(`Error fetching evaluation ${evaluationId}:`, error)

to:

console.error("Error fetching evaluation %s:", evaluationId, error)

This change should be made to web/ee/src/services/human-evaluations/api/index.ts, specifically line 93. No changes are needed elsewhere. No new imports or dependencies are required.

web/ee/src/services/human-evaluations/api/index.ts

+    return await axios
+        .get(
+            `${getAgentaApiUrl()}/human-evaluations/${evaluationTableId}/evaluation_scenarios?project_id=${projectId}`,
+        )


To fix this problem, we should validate or sanitize the untrusted input in evaluationTableId before interpolation into the URL. The most robust approach is to allow only expected patterns (for example, UUIDs, MongoDB ObjectIDs, or simple alphanumeric IDs, depending on real use). Generic input should be rejected or replaced with a harmless value.

Implementation details:

Add a simple validation function to check that evaluationTableId matches a safe regular expression (e.g., for UUIDs, /^[a-fA-F0-9-]+$/ for a hex or UUID, or an ObjectId: /^[a-fA-F0-9]{24}$/).

Throw or return an error if the check fails (optimally).

Only proceed with the axios call if the ID is valid.

Since we cannot add utilities outside the snippets shown, define the validation function within web/ee/src/services/human-evaluations/api/index.ts.

Use the validation function at the point of use in fetchAllLoadEvaluationsScenarios.

The rest of the code can remain unchanged, ensuring existing functionality is preserved.

web/ee/src/services/human-evaluations/api/index.ts

+    const response = await axios.put(
+        `${getAgentaApiUrl()}/human-evaluations/${evaluationTableId}/evaluation_scenario/${evaluationScenarioId}/${evaluationType}?project_id=${projectId}`,
+        data,
+    )


To fix this issue, we should validate that evaluationScenarioId and any other path parameters used to construct API URLs are safe and well-formed before they are interpolated into the outgoing URL. The most robust and minimal-impact fix is to ensure these IDs match a strict format (e.g., UUID, or a limited safe character set). We should implement a simple validation function (e.g., isValidId) that checks whether the ID matches a known regular expression for valid resource IDs (e.g., /^[a-zA-Z0-9-_]+$/ for slugs, or the canonical UUID regex if UUIDs are expected). If the validation fails, we should throw an error or avoid making the request.

Edits should be made inside web/ee/src/services/human-evaluations/api/index.ts within the updateEvaluationScenario function. Additionally, we should define the isValidId helper function in the same file. Imports are not needed unless an external validator is used (for simplicity, use a local regex). Insert the validation before constructing the API request URL; if the ID is not valid, throw an error.

feature/sdk-config-caching

293c3fa

Copilot AI review requested due to automatic review settings November 11, 2025 17:34

Copilot AI reviewed Nov 11, 2025

View reviewed changes

github-advanced-security bot found potential problems Nov 11, 2025

View reviewed changes

junaway changed the title ~~[POC] Feature/sdk config caching~~ [stale] [poc] Add sdk config caching Nov 14, 2025

@@ -21,6 +21,11 @@
             import {evaluationRunStateAtom, loadingStateAtom, evalAtomStore} from "./assets/atoms"
             import {buildRunIndex} from "./assets/helpers/buildRunIndex"
+            // Allow alphanumeric, underscore and dash only for IDs
+            export function isValidId(id: string | null | undefined): boolean {
+                return typeof id === "string" && /^[a-zA-Z0-9_-]+$/.test(id)
+            }
             const fetchLegacyScenariosData = async (
                 evaluationId: string,
                 evaluationObj: Evaluation,
@@ -68,6 +73,14 @@
                 // New fetcher for preview runs that fetches and enriches with testsetData
                 const fetchAndEnrichPreviewRun = useCallback(async () => {
+                    if (!isValidId(evaluationTableId)) {
+                        console.error("[useEvaluationRunData] Invalid evaluationTableId:", evaluationTableId);
+                        evalAtomStore().set(loadingStateAtom, (draft) => {
+                            draft.isLoadingEvaluation = false
+                            draft.activeStep = null
+                        })
+                        return null
+                    }
                     evalAtomStore().set(loadingStateAtom, (draft) => {
                         draft.isLoadingEvaluation = true
                         draft.activeStep = "eval-run"

@@ -3,6 +3,11 @@
             import {getCurrentProject} from "@/oss/contexts/project.context"
             import axios from "@/oss/lib/api/assets/axiosConfig"
+            // SSRF protection: UUID validation utility
+            function isValidUUID(str: string): boolean {
+                return /^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i.test(str);
+            }
             import {getTagColors} from "@/oss/lib/helpers/colors"
             import {calcEvalDuration} from "@/oss/lib/helpers/evaluate"
             import {isDemo, stringToNumberInRange} from "@/oss/lib/helpers/utils"
@@ -145,6 +150,9 @@
             }
             export const fetchEvaluation = async (evaluationId: string) => {
+                if (!isValidUUID(evaluationId)) {
+                    throw new Error("Invalid evaluationId format."); // SSRF protection
+                }
                 const {projectId} = getCurrentProject()
                 const response = await axios.get(`/evaluations/${evaluationId}?project_id=${projectId}`)
@@ -196,6 +204,9 @@
             // Evaluation Scenarios
             export const fetchAllEvaluationScenarios = async (evaluationId: string) => {
+                if (!isValidUUID(evaluationId)) {
+                    throw new Error("Invalid evaluationId format."); // SSRF protection
+                }
                 const {projectId} = getCurrentProject()
                 const [{data: evaluationScenarios}, evaluation] = await Promise.all([

@@ -26,6 +26,14 @@
             import similarityImg from "@/oss/media/transparency.png"
             import {fetchTestset} from "@/oss/services/testsets/api"
+            // Accepts UUID or 24/32 hex string (Mongo/ObjectId-like or UUID)
+            function isValidEvaluationId(id: string): boolean {
+                // Strict UUID v4 or 24-hex or 32-hex string (ObjectIds or UUIDs)
+                // Adjust pattern as needed for your system's IDs
+                return /^[a-fA-F0-9]{24}$/.test(id) || /^[a-fA-F0-9]{32}$/.test(id) || /^[0-9a-fA-F\-]{36}$/.test(id)
+            }
             //Prefix convention:
             //  - fetch: GET single entity from server
             //  - fetchAll: GET all entities from server
@@ -196,6 +204,9 @@
             // Evaluation Scenarios
             export const fetchAllEvaluationScenarios = async (evaluationId: string) => {
+                if (!isValidEvaluationId(evaluationId)) {
+                    throw new Error("Invalid evaluationId format")
+                }
                 const {projectId} = getCurrentProject()
                 const [{data: evaluationScenarios}, evaluation] = await Promise.all([
@@ -224,7 +235,11 @@
             // Comparison
             export const fetchAllComparisonResults = async (evaluationIds: string[]) => {
-                const scenarioGroups = await Promise.all(evaluationIds.map(fetchAllEvaluationScenarios))
+                const validIds = evaluationIds.filter(isValidEvaluationId)
+                if (validIds.length === 0) {
+                    throw new Error("No valid evaluationIds provided")
+                }
+                const scenarioGroups = await Promise.all(validIds.map(fetchAllEvaluationScenarios))
                 const testset: TestSet = await fetchTestset(scenarioGroups[0][0].evaluation?.testset?.id)
                 const inputsNameSet = new Set<string>()

@@ -2,6 +2,7 @@
             import {EvaluationStatus} from "@/oss/lib/Types"
             import {BaseResponse} from "@/oss/lib/Types"
+            import validator from "validator"
             /**
              * Update scenario status from a WebWorker / non-axios context.
@@ -13,6 +14,10 @@
                 status: EvaluationStatus,
                 projectId: string,
             ): Promise<void> {
+                // Validate scenarioId is a UUID before proceeding
+                if (!validator.isUUID(scenarioId)) {
+                    throw new Error("Invalid scenarioId format");
+                }
                 try {
                     // 1. fetch full scenario (backend requires full object on PATCH)
                     const res = await fetch(

@@ -69,7 +69,8 @@
                     "typescript": "5.8.3",
                     "use-animation-frame": "^0.2.1",
                     "usehooks-ts": "^3.1.0",
-                    "uuid": "^11.1.0"
+                    "uuid": "^11.1.0",
+                    "validator": "^13.15.23"
                 },
                 "devDependencies": {
                     "@agenta/web-tests": "workspace:../tests",

		runs: List[EvaluationRunEdit]


		class EvaluationRunQueryRequest(BaseModel):

-class EvaluationRunQueryRequest(BaseModel):
+class EvaluationRunQueryRequest(BaseModel):
+    """
+    Request model for querying evaluation runs.
+    - `run` is required because it specifies the query parameters for the evaluation run.
+    - `include_archived` is optional and defaults to False, so archived runs are excluded unless explicitly requested.
+    """

		span_type_enum = sa.Enum(SpanType, name="tracetype")
		trace_type_enum = sa.Enum(TraceType, name="spantype")

	description="My Default Organization",
	description="Default Organization",

Package	Version	Security advisories
validator (npm)	13.15.23	None

		return await axios
		.get(`${getAgentaApiUrl()}/human-evaluations/${evaluationId}?project_id=${projectId}`)

@@ -106,12 +106,25 @@
                 return response.data
             }
+            // Validation helper: only allow MongoDB ObjectID (24 hex chars) or UUID (standard).
+            function isValidEvaluationId(id: string | undefined): boolean {
+                if (!id) return false;
+                // Accept ObjectId or UUID-like patterns. Adjust as appropriate for your app.
+                const objectIdRe = /^[a-f\d]{24}$/i;
+                const uuidRe = /^[0-9a-f]{8}-[0-9a-f]{4}-[1-5][0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$/i;
+                return objectIdRe.test(id) || uuidRe.test(id);
+            }
             export const fetchAllLoadEvaluationsScenarios = async (
                 evaluationTableId: string,
                 evaluation: Evaluation,
             ) => {
                 const {projectId} = getCurrentProject()
+                if (!isValidEvaluationId(evaluationTableId)) {
+                    throw new Error("Invalid evaluationTableId format");
+                }
                 return await axios
                     .get(
                         `${getAgentaApiUrl()}/human-evaluations/${evaluationTableId}/evaluation_scenarios?project_id=${projectId}`,

@@ -183,6 +183,14 @@
                 return response.data
             }
+            function isValidId(id: string): boolean {
+                // Accepts UUIDs or alphanum/slash/underscore/hyphen-based slugs (adjust regex as needed)
+                // UUID: /^[0-9a-fA-F\-]{36}$/
+                // Slug: /^[a-zA-Z0-9\-_]+$/
+                const uuidRegex = /^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$/
+                const slugRegex = /^[a-zA-Z0-9\-_]+$/
+                return uuidRegex.test(id) || slugRegex.test(id)
+            }
             export const updateEvaluationScenario = async (
                 evaluationTableId: string,
                 evaluationScenarioId: string,
@@ -191,6 +199,14 @@
             ) => {
                 const {projectId} = getCurrentProject()
+                // Validate IDs before using them in path
+                if (!isValidId(evaluationTableId)) {
+                    throw new Error('Invalid evaluationTableId')
+                }
+                if (!isValidId(evaluationScenarioId)) {
+                    throw new Error('Invalid evaluationScenarioId')
+                }
                 const response = await axios.put(
                     `${getAgentaApiUrl()}/human-evaluations/${evaluationTableId}/evaluation_scenario/${evaluationScenarioId}/${evaluationType}?project_id=${projectId}`,
                     data,

[stale] [poc] Add sdk config caching #2909

Are you sure you want to change the base?

[stale] [poc] Add sdk config caching #2909

Uh oh!

Conversation

jp-agenta commented Nov 11, 2025

Uh oh!

CLAassistant commented Nov 11, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Check failure

Uh oh!

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Check failure

Uh oh!

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Uh oh!

Copilot Autofix

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants