Skip to content

test(agents): added agent conformance spec and js harness#5256

Draft
pavelgj wants to merge 40 commits into
pj/agents-samplefrom
pj/agents-conformance-tests
Draft

test(agents): added agent conformance spec and js harness#5256
pavelgj wants to merge 40 commits into
pj/agents-samplefrom
pj/agents-conformance-tests

Conversation

@pavelgj
Copy link
Copy Markdown
Member

@pavelgj pavelgj commented May 7, 2026

No description provided.

pavelgj added 12 commits May 6, 2026 20:47
Upgrade postcss from 8.4.31 to 8.5.12 across all workspaces and add
new markdown/MDX-related dependencies (remark, rehype, unified, etc.)
to the lockfile.
…lify workspaceAgent

- Rename `simple-agent` to `custom-agent` and update all imports, routes,
  and log messages accordingly
- Refactor `workspace-builder.ts` to use `defineAgent` instead of
  `defineCustomAgent`, leveraging the standard agent API for model calls,
  tool dispatch, streaming, and message management
- Extract `emitArtifact` tool to module scope using `ai.defineTool`
Add new type exports for `agent` and `agents-conformance` modules
to the common types barrel file.
…ance tests

Introduce Phase 2 of agent conformance testing with `defineCustomAgent`
and four new custom agents (blocking, failing, withArtifacts,
withCustomState). Add support for detach/background execution, abort,
artifact streaming/deduplication, and custom state persistence.

- Add `defineCustomAgent` API in session-flow.ts for agents with
  fixed deterministic logic (no programmable model needed)
- Implement artifact support with `addArtifact` on session and
  streaming via `agentArtifacts` chunks
- Add custom state read/write (`getCustomState`/`setCustomState`)
  persisted across invocations
- Add detach support (`detach` flag, `waitUntilCompleted` helper)
  for background agent execution
- Add abort support to cancel pending agent snapshots
- Extend test spec YAML with 10 new tests (16 total) covering
  detach, abort, artifacts, and custom state categories
- Update conformance testing docs with custom agent table and
  test coverage summary
- Update expectChunks to semi-strict type-aware matching logic
- Preserve null values in deepStrip to distinguish null vs absent
- Fix abort expectPreviousStatus to support YAML null (~) correctly
- Add stateContains message subsequence assertions to test specs
- Update conformance testing docs to reflect matching semantics
Add `errorContains` snapshot assertion for subset-matching on
`snapshot.error`. Introduce three new conformance tests: server-managed
state ignoring init state, pure detach without payload, and failed
snapshot error details. Update test count from 16 to 19 in docs.
Standardize terminology throughout the agent conformance testing
documentation and tooling by replacing "invocations" with "steps"
to better describe the ordered sequence of operations in test cases.
@github-actions github-actions Bot added docs Improvements or additions to documentation js tooling config test labels May 7, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request establishes a comprehensive Agent Conformance Testing framework, introducing a shared YAML specification, detailed documentation, and a reference JavaScript test harness. It includes new Zod schemas for the agent wire protocol and conformance test format to ensure cross-language compatibility. Feedback focuses on improving maintainability by reducing schema duplication across packages and enhancing type safety by replacing 'any' types with more specific schemas or 'unknown' in both the core types and the new translator test application.

I am having trouble creating individual review comments. Click here to see my feedback.

js/ai/tests/agents_spec_test.ts (54-117)

high

These Zod schemas are duplicated from genkit-tools/common/src/types/agents-conformance.ts. The comment on line 51 explains this is because js/ai does not depend on genkit-tools/common. This duplication is a significant maintainability risk, as changes in the canonical schemas can be missed here, leading to inconsistencies and test failures.

Consider refactoring the package dependencies to allow js/ai to import these schemas directly from genkit-tools/common. This would make the test harness more robust and easier to maintain.

genkit-tools/common/src/types/agent.ts (90)

medium

The status property is typed as z.any(), which is very permissive and reduces type safety. If the structure of status is known, it should be defined with a more specific schema. If it's truly unknown, z.unknown() is a safer alternative to z.any() as it forces validation on consumers. Since this property doesn't seem to be used in the current test spec, now would be a good time to improve its type definition for future use.

  status: z.unknown().optional(),

js/testapps/agents/web/src/pages/Translator.tsx (71)

medium

Casting the result of runFlow to any bypasses TypeScript's type safety. This can lead to runtime errors if the API response structure changes. It would be safer to define a type for the expected response and then validate or cast to that specific type. For example, you could use a Zod schema to parse the response.

pavelgj added 17 commits May 8, 2026 19:24
Remove the optional `newSnapshotId` property from `AgentInitSchema`
as it is no longer needed in the agent initialization configuration.
Update the "abort completed agent" test to expect that a snapshot
in the "done" state remains "done" after an abort request. Terminal
states (done, failed, aborted) cannot be overridden — only "pending"
can transition to "aborted".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config docs Improvements or additions to documentation js test tooling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant