We have now implemented the global chat for one-shot workflows, i.e. scenarios where the user generates a workflow from scratch. To develop the global chat architecture further, we will focus on a set of user scenarios.
This batch of scenarios will target the most essential, top priority use cases relating to job code edits. These will be scenarios where the user is currently viewing a step, and the conversation primarily relates to that step only (not e.g. "change the job code across my steps", or "change the code in the final step").
Each scenario will involve on or more tests (unit/service/integration/acceptance)
- Basic First conversation turn relating to job code (no code generation)
- Basic Multi-turn conversation relating to job code (no code generation)
- User asks general question about job code (no code generation)
- User asks question about job code that requires reading the logs (no code generation)
- The user asks "what does this step do?" (does it describe the job code, or the whole workflow?)
- User asks for a change to the job code they are viewing in a first conversation turn (adapt job_chat tests for use via global_chat; require knowing about adaptor signatures)
- User asks for a change to the job code they are viewing with a long conversation history relating to job code
- User asks for a change to the job code they are viewing with a long conversation history relating to several different parts of the workflow
Targeting these tests will help us refine the architecture and make sure can handle all contextual information correctly, call the job code assistant when required, handle tool use. We will also define the expected behaviour in common ambiguous cases: When should the model ask for more information? When should it not generate job code? How can it decide when to focus on one step and give a brief answer vs examine the entire workflow?
We have now implemented the global chat for one-shot workflows, i.e. scenarios where the user generates a workflow from scratch. To develop the global chat architecture further, we will focus on a set of user scenarios.
This batch of scenarios will target the most essential, top priority use cases relating to job code edits. These will be scenarios where the user is currently viewing a step, and the conversation primarily relates to that step only (not e.g. "change the job code across my steps", or "change the code in the final step").
Each scenario will involve on or more tests (unit/service/integration/acceptance)
Targeting these tests will help us refine the architecture and make sure can handle all contextual information correctly, call the job code assistant when required, handle tool use. We will also define the expected behaviour in common ambiguous cases: When should the model ask for more information? When should it not generate job code? How can it decide when to focus on one step and give a brief answer vs examine the entire workflow?