test(agents): added agent conformance spec and js harness#5256
test(agents): added agent conformance spec and js harness#5256pavelgj wants to merge 40 commits into
Conversation
Upgrade postcss from 8.4.31 to 8.5.12 across all workspaces and add new markdown/MDX-related dependencies (remark, rehype, unified, etc.) to the lockfile.
…lify workspaceAgent - Rename `simple-agent` to `custom-agent` and update all imports, routes, and log messages accordingly - Refactor `workspace-builder.ts` to use `defineAgent` instead of `defineCustomAgent`, leveraging the standard agent API for model calls, tool dispatch, streaming, and message management - Extract `emitArtifact` tool to module scope using `ai.defineTool`
Add new type exports for `agent` and `agents-conformance` modules to the common types barrel file.
…ance tests Introduce Phase 2 of agent conformance testing with `defineCustomAgent` and four new custom agents (blocking, failing, withArtifacts, withCustomState). Add support for detach/background execution, abort, artifact streaming/deduplication, and custom state persistence. - Add `defineCustomAgent` API in session-flow.ts for agents with fixed deterministic logic (no programmable model needed) - Implement artifact support with `addArtifact` on session and streaming via `agentArtifacts` chunks - Add custom state read/write (`getCustomState`/`setCustomState`) persisted across invocations - Add detach support (`detach` flag, `waitUntilCompleted` helper) for background agent execution - Add abort support to cancel pending agent snapshots - Extend test spec YAML with 10 new tests (16 total) covering detach, abort, artifacts, and custom state categories - Update conformance testing docs with custom agent table and test coverage summary
- Update expectChunks to semi-strict type-aware matching logic - Preserve null values in deepStrip to distinguish null vs absent - Fix abort expectPreviousStatus to support YAML null (~) correctly - Add stateContains message subsequence assertions to test specs - Update conformance testing docs to reflect matching semantics
Add `errorContains` snapshot assertion for subset-matching on `snapshot.error`. Introduce three new conformance tests: server-managed state ignoring init state, pure detach without payload, and failed snapshot error details. Update test count from 16 to 19 in docs.
Standardize terminology throughout the agent conformance testing documentation and tooling by replacing "invocations" with "steps" to better describe the ordered sequence of operations in test cases.
There was a problem hiding this comment.
Code Review
This pull request establishes a comprehensive Agent Conformance Testing framework, introducing a shared YAML specification, detailed documentation, and a reference JavaScript test harness. It includes new Zod schemas for the agent wire protocol and conformance test format to ensure cross-language compatibility. Feedback focuses on improving maintainability by reducing schema duplication across packages and enhancing type safety by replacing 'any' types with more specific schemas or 'unknown' in both the core types and the new translator test application.
I am having trouble creating individual review comments. Click here to see my feedback.
js/ai/tests/agents_spec_test.ts (54-117)
These Zod schemas are duplicated from genkit-tools/common/src/types/agents-conformance.ts. The comment on line 51 explains this is because js/ai does not depend on genkit-tools/common. This duplication is a significant maintainability risk, as changes in the canonical schemas can be missed here, leading to inconsistencies and test failures.
Consider refactoring the package dependencies to allow js/ai to import these schemas directly from genkit-tools/common. This would make the test harness more robust and easier to maintain.
genkit-tools/common/src/types/agent.ts (90)
The status property is typed as z.any(), which is very permissive and reduces type safety. If the structure of status is known, it should be defined with a more specific schema. If it's truly unknown, z.unknown() is a safer alternative to z.any() as it forces validation on consumers. Since this property doesn't seem to be used in the current test spec, now would be a good time to improve its type definition for future use.
status: z.unknown().optional(),
js/testapps/agents/web/src/pages/Translator.tsx (71)
Casting the result of runFlow to any bypasses TypeScript's type safety. This can lead to runtime errors if the API response structure changes. It would be safer to define a type for the expected response and then validate or cast to that specific type. For example, you could use a Zod schema to parse the response.
Add `expectError` assertion to the agent test spec runner, allowing tests to verify that an invocation throws an expected error message. Update the "server-managed agent ignores init state" test to instead verify that sending `state` to a server-managed agent throws a FAILED_PRECONDITION error, rather than silently ignoring it.
Remove the optional `newSnapshotId` property from `AgentInitSchema` as it is no longer needed in the agent initialization configuration.
Update the "abort completed agent" test to expect that a snapshot in the "done" state remains "done" after an abort request. Terminal states (done, failed, aborted) cannot be overridden — only "pending" can transition to "aborted".
No description provided.