Establish an effective suite of acceptence criteria tests.
These tests are designed to be manually reviewed by a product stakeholder or an LLM model. They may be sent to langfuse for analysis.
They are focused on quality and style of answers to key questions. They should be easy to audit by a Joe or a Brandon to ensure the quality and voice of the AI Assistant throughout development (particularly important after model version updates)
Here are the principles of acceptance criteria tests:
- They are implemented as HTTP requests against the bun server, but this is not surfaced
- They include live model calls
- They are likely evaluated by an LLM to determine pass/fail status
- Test suites must be richly defined and easily evaluated. Tests might be markdown files, for example, with a question, some data, and a set of natural language assertions
- When designing new AI features, the product owner may specify some hero questions/conversations to drive development. Those questions would make good acceptance criteria tests
- We may want the results to be checked into git so that responses can be compared
Establish an effective suite of acceptence criteria tests.
These tests are designed to be manually reviewed by a product stakeholder or an LLM model. They may be sent to langfuse for analysis.
They are focused on quality and style of answers to key questions. They should be easy to audit by a Joe or a Brandon to ensure the quality and voice of the AI Assistant throughout development (particularly important after model version updates)
Here are the principles of acceptance criteria tests: