Skip to content

Commit 653f0d7

Browse files
committed
updated docs for dictionary API usage. tweak the usage frequency of the tool that uses the API.
1 parent 8f033aa commit 653f0d7

File tree

3 files changed

+31
-12
lines changed

3 files changed

+31
-12
lines changed

crossword_companion/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ iOS, web and macOS.
1818
The application uses a multi-modal Gemini model (`gemini-2.5-pro`) to analyze an
1919
image of a crossword puzzle. It then uses a separate model (`gemini-2.5-flash`),
2020
configured with a detailed system prompt to act as a crossword "expert", to
21-
solve the puzzle.
21+
solve the puzzle. Additionally, the app integrates with an external dictionary API (dictionaryapi.dev) to provide word metadata (e.g., part of speech) when requested by the Gemini model during the solving process. This integration allows the Gemini model to verify grammatical constraints, such as part of speech, for potential answers, thereby improving the accuracy and relevance of its solutions.
2222

2323
The app itself drives the solving process. For each clue, it determines the
2424
required word length and the current known letter pattern from the grid. It then

crossword_companion/lib/services/gemini_service.dart

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ class GeminiService {
7474

7575
static final _getWordMetadataFunction = FunctionDeclaration(
7676
'getWordMetadata',
77-
'Gets metadata for a word, like its part of speech.',
77+
'Gets grammatical metadata for a word, like its part of speech. Best used to verify a candidate answer against a clue that implies a grammatical constraint.',
7878
parameters: {
7979
'word': Schema(SchemaType.string, description: 'The word to look up.'),
8080
},
@@ -87,7 +87,7 @@ You are an expert crossword puzzle solver.
8787
**Follow these rules at all times:**
8888
1. **Prefer Common Words:** Prioritize common English words and proper nouns. Avoid obscure, archaic, or highly technical terms unless the clue strongly implies them.
8989
2. **Match the Clue:** Ensure your answer strictly matches the clue's tense, plurality (singular vs. plural), and part of speech.
90-
3. **Verify Grammatically:** If a clue implies a part of speech, use the `getWordMetadata` tool to verify your candidate answer has the correct part of speech.
90+
3. **Verify Grammatically:** If a clue implies a specific part of speech (e.g., it's a verb, adverb, or plural), it's a good idea to use the `getWordMetadata` tool to verify your candidate answer matches. However, avoid using it for every clue.
9191
4. **Be Confident:** Provide a confidence score from 0.0 to 1.0 indicating your certainty.
9292
5. **Trust the Clue Over the Pattern:** The provided letter pattern is only a suggestion based on other potentially incorrect answers. Your primary goal is to find the best word that fits the **clue text**. If you are confident in an answer that contradicts the provided pattern, you should use that answer.
9393
6. **Format Correctly:** You must return your answer in the specified JSON format.
@@ -99,7 +99,16 @@ You are an expert crossword puzzle solver.
9999
You have a tool to get grammatical information about a word.
100100
101101
**When to use:**
102-
- Use this tool when a clue implies a part of speech (e.g., "To run," "An object," "Happily") to confirm your answer matches.
102+
- This tool is most helpful as a verification step after you have a likely answer.
103+
- Consider using this tool when a clue contains a grammatical hint that could be ambiguous.
104+
- **Good candidates for verification:**
105+
- Clues that seem to be verbs (e.g., "To run," "Waving").
106+
- Clues that are adverbs (e.g., "Happily," "Quickly").
107+
- Clues that specify a plural form.
108+
- **Try to avoid using the tool for:**
109+
- Simple definitions (e.g., "A small dog").
110+
- Fill-in-the-blank clues (e.g., "___ and flow").
111+
- Proper nouns (e.g., "Capital of France").
103112
104113
**Function signature:**
105114
```json

crossword_companion/specs/design.md

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ The application follows a standard Flutter project structure, using the `firebas
1313
The application uses a single screen with a vertical `Stepper` to guide the user through the workflow.
1414

1515
- **Explicit State Passing:** The `CrosswordScreen` is responsible for building the `Stepper`. It determines which step is active based on the `currentStep` from the `AppStepState`. It then passes an `isActive` boolean (`isActive: appStepState.currentStep == stepIndex`) to each step's content widget.
16-
- **Mixin-Based Activation Logic:** To adhere to the DRY principle, the common state management logic for each stepper page is encapsulated in a `StepStateMixin`. This mixin provides the `initState` and `didUpdateWidget` lifecycle methods, which automatically call an `onActivated` method when the step becomes active. Each step's `State` class uses this mixin, ensuring activation logic runs reliably without duplicating code.
16+
- **Mixin-Based Activation Logic:** To adhere to the DRY principle, the common state management logic for each stepper page is encapsulated in a `StepActivationMixin`. This mixin provides the `initState` and `didUpdateWidget` lifecycle methods, which automatically call an `onActivated` method when the step becomes active. Each step's `State` class uses this mixin, ensuring activation logic runs reliably without duplicating code.
1717
- **Encapsulated Controls:** Each step widget is responsible for rendering its own navigation controls (e.g., "NEXT", "BACK", "SOLVE"). These controls directly call methods on the appropriate state notifiers (e.g., `appStepState.nextStep()`, `puzzleSolverState.solvePuzzle()`) to update the application's state.
1818

1919
### Stepper Steps:
@@ -51,7 +51,7 @@ This decoupled approach ensures that each part of the state is managed independe
5151
- **`GeminiService`:** This service handles all communication with the Gemini models. It is configured with the "expert" system prompt for the solver and has methods for:
5252
- `inferCrosswordData(images)`: Calls `gemini-2.5-pro` to analyze one or more images.
5353
- `solveClue(clue, length, pattern)`: Calls `gemini-2.5-flash` to get an answer and confidence score for a single clue.
54-
- `getWordMetadata(word)`: Calls `gemini-2.5-flash` to get metadata (e.g., part of speech, definition) for a given word.
54+
- `getWordMetadata(word)`: This is a function declaration provided to the `gemini-2.5-flash` model. When the model invokes this function, the application calls the `getWordMetadataFromApi(word)` method, which queries a public dictionary API (`dictionaryapi.dev`) to retrieve grammatical information for the given word.
5555
- **`PuzzleSolver`:** Contains the business logic for the main solving loop, iterating through clues and coordinating with the `GeminiService`, `PuzzleDataState`, and `PuzzleSolverState` to solve the puzzle.
5656

5757
## 5. Puzzle Solving Logic
@@ -67,17 +67,27 @@ The puzzle-solving process is managed by an app-driven loop within the `PuzzleSo
6767
- **Red:** Two conflicting answers for the same cell.
6868
5. **Looping:** The app loops through all unsolved clues until the puzzle is complete, making multiple passes if necessary to retry clues that were previously answered incorrectly.
6969

70-
### Handling Concurrent Operations
70+
### Handling Function Calls and Structured Output
7171

72-
To provide a responsive user experience and efficiently manage resources, the application must be able to cancel in-flight requests to the Gemini API. This is critical when the user pauses or restarts the solving process. The application achieves this through stream-based API calls and explicit cancellation.
72+
To ensure robust interaction with the Gemini model for clue solving, the application uses a sophisticated, multi-step process encapsulated within the `_generateJsonWithFunctionsAndSchema` helper method in `GeminiService`. This process is designed to handle both model-driven function calls and a final, strictly-formatted JSON output.
7373

74-
- **Stream-Based Requests:** Instead of using the single-response `generateContent` method, the `GeminiService` uses `streamGenerateContent` for solving clues. This method returns a `Stream` of response chunks.
74+
- **Two-Model Approach:** The service uses two configurations of the `gemini-2.5-flash` model:
75+
- `_clueSolverModelWithFunctions`: This model is configured with the `getWordMetadata` tool, allowing it to request additional information during its reasoning process.
76+
- `_clueSolverModelWithSchema`: This model is configured with a strict JSON output schema, ensuring the final answer is always in the correct format (`{ "answer": "...", "confidence": ... }`).
7577

76-
- **Request Cancellation:** When the service listens to this stream, it receives a `StreamSubscription` object. This object acts as a handle to the active API call and has a `cancel()` method.
78+
- **Chat-Based Interaction:** The process begins by starting a chat session with `_clueSolverModelWithFunctions`.
79+
1. The initial prompt (clue, length, pattern) is sent.
80+
2. The app checks the model's response for any `functionCalls`.
81+
3. If the model requests a function call (e.g., `getWordMetadata`), the app executes it (by calling the dictionary API) and sends the result back to the model in the same chat session.
82+
4. This loop continues until the model responds with its reasoning complete, without requesting further function calls.
7783

78-
- **Encapsulated Logic:** The `GeminiService` encapsulates this entire mechanism. It holds a reference to the current `_clueSolverSubscription` and exposes a public `cancelCurrentSolve()` method. When this method is called, it cancels the active subscription, which immediately terminates the underlying network request.
84+
- **Forcing JSON Output:** Once the function-calling loop is complete, the app takes the entire chat history and uses it to make a final call to the `_clueSolverModelWithSchema`. This effectively asks the model to summarize its final conclusion from the preceding conversation into the required JSON format.
7985

80-
- **State Integration:** The `PuzzleSolverState` calls `geminiService.cancelCurrentSolve()` from its `pauseSolving` and `restartSolving` methods. This ensures that whenever the user interrupts the solver, the active LLM request is instantly canceled, preventing wasted processing and ensuring no stale data can interfere with the new state. The `solveClue` method in the service is designed to handle this cancellation gracefully, returning a `null` value which the `PuzzleSolver` already handles.
86+
This robust, app-driven process ensures that the model can access external tools when needed while still providing a predictable, machine-readable output for the application to consume.
87+
88+
### Request Cancellation
89+
90+
The `GeminiService` includes a `cancelCurrentSolve()` method, and the `solveClue` method begins with a call to it. However, in the current implementation, the underlying Gemini API calls for clue solving are made via `sendMessage` on a chat, which returns a `Future` and does not support in-flight cancellation. The `cancelCurrentSolve` method is a remnant of a previous, stream-based implementation and currently has no effect. The `PuzzleSolverState` calls this method from its `pauseSolving` and `restartSolving` methods, but it does not interrupt an ongoing `solveClue` operation. A `solveClue` call will always run to completion.
8191

8292
## 6. Data Models
8393

0 commit comments

Comments
 (0)