updated docs for dictionary API usage. tweak the usage frequency of the tool that uses the API.

csells · csells · commit 653f0d78e10d · 2025-11-09T19:38:13.000-08:00
diff --git a/crossword_companion/README.md b/crossword_companion/README.md
@@ -18,7 +18,7 @@ iOS, web and macOS.
 The application uses a multi-modal Gemini model (`gemini-2.5-pro`) to analyze an
 image of a crossword puzzle. It then uses a separate model (`gemini-2.5-flash`),
 configured with a detailed system prompt to act as a crossword "expert", to
-solve the puzzle.
+solve the puzzle. Additionally, the app integrates with an external dictionary API (dictionaryapi.dev) to provide word metadata (e.g., part of speech) when requested by the Gemini model during the solving process. This integration allows the Gemini model to verify grammatical constraints, such as part of speech, for potential answers, thereby improving the accuracy and relevance of its solutions.
 
 The app itself drives the solving process. For each clue, it determines the
 required word length and the current known letter pattern from the grid. It then
diff --git a/crossword_companion/lib/services/gemini_service.dart b/crossword_companion/lib/services/gemini_service.dart
@@ -74,7 +74,7 @@ class GeminiService {
 
   static final _getWordMetadataFunction = FunctionDeclaration(
     'getWordMetadata',
-    'Gets metadata for a word, like its part of speech.',
+    'Gets grammatical metadata for a word, like its part of speech. Best used to verify a candidate answer against a clue that implies a grammatical constraint.',
     parameters: {
       'word': Schema(SchemaType.string, description: 'The word to look up.'),
     },
@@ -87,7 +87,7 @@ You are an expert crossword puzzle solver.
 **Follow these rules at all times:**
 1.  **Prefer Common Words:** Prioritize common English words and proper nouns. Avoid obscure, archaic, or highly technical terms unless the clue strongly implies them.
 2.  **Match the Clue:** Ensure your answer strictly matches the clue's tense, plurality (singular vs. plural), and part of speech.
-3.  **Verify Grammatically:** If a clue implies a part of speech, use the `getWordMetadata` tool to verify your candidate answer has the correct part of speech.
+3.  **Verify Grammatically:** If a clue implies a specific part of speech (e.g., it's a verb, adverb, or plural), it's a good idea to use the `getWordMetadata` tool to verify your candidate answer matches. However, avoid using it for every clue.
 4.  **Be Confident:** Provide a confidence score from 0.0 to 1.0 indicating your certainty.
 5.  **Trust the Clue Over the Pattern:** The provided letter pattern is only a suggestion based on other potentially incorrect answers. Your primary goal is to find the best word that fits the **clue text**. If you are confident in an answer that contradicts the provided pattern, you should use that answer.
 6.  **Format Correctly:** You must return your answer in the specified JSON format.
@@ -99,7 +99,16 @@ You are an expert crossword puzzle solver.
 You have a tool to get grammatical information about a word.
 
 **When to use:**
-- Use this tool when a clue implies a part of speech (e.g., "To run," "An object," "Happily") to confirm your answer matches.
+- This tool is most helpful as a verification step after you have a likely answer.
+- Consider using this tool when a clue contains a grammatical hint that could be ambiguous.
+- **Good candidates for verification:**
+    - Clues that seem to be verbs (e.g., "To run," "Waving").
+    - Clues that are adverbs (e.g., "Happily," "Quickly").
+    - Clues that specify a plural form.
+- **Try to avoid using the tool for:**
+    - Simple definitions (e.g., "A small dog").
+    - Fill-in-the-blank clues (e.g., "___ and flow").
+    - Proper nouns (e.g., "Capital of France").
 
 **Function signature:**
 ```json
diff --git a/crossword_companion/specs/design.md b/crossword_companion/specs/design.md
@@ -13,7 +13,7 @@ The application follows a standard Flutter project structure, using the `firebas
 The application uses a single screen with a vertical `Stepper` to guide the user through the workflow.
 
 -   **Explicit State Passing:** The `CrosswordScreen` is responsible for building the `Stepper`. It determines which step is active based on the `currentStep` from the `AppStepState`. It then passes an `isActive` boolean (`isActive: appStepState.currentStep == stepIndex`) to each step's content widget.
--   **Mixin-Based Activation Logic:** To adhere to the DRY principle, the common state management logic for each stepper page is encapsulated in a `StepStateMixin`. This mixin provides the `initState` and `didUpdateWidget` lifecycle methods, which automatically call an `onActivated` method when the step becomes active. Each step's `State` class uses this mixin, ensuring activation logic runs reliably without duplicating code.
+-   **Mixin-Based Activation Logic:** To adhere to the DRY principle, the common state management logic for each stepper page is encapsulated in a `StepActivationMixin`. This mixin provides the `initState` and `didUpdateWidget` lifecycle methods, which automatically call an `onActivated` method when the step becomes active. Each step's `State` class uses this mixin, ensuring activation logic runs reliably without duplicating code.
 -   **Encapsulated Controls:** Each step widget is responsible for rendering its own navigation controls (e.g., "NEXT", "BACK", "SOLVE"). These controls directly call methods on the appropriate state notifiers (e.g., `appStepState.nextStep()`, `puzzleSolverState.solvePuzzle()`) to update the application's state.
 
 ### Stepper Steps:
@@ -51,7 +51,7 @@ This decoupled approach ensures that each part of the state is managed independe
 - **`GeminiService`:** This service handles all communication with the Gemini models. It is configured with the "expert" system prompt for the solver and has methods for:
     - `inferCrosswordData(images)`: Calls `gemini-2.5-pro` to analyze one or more images.
     - `solveClue(clue, length, pattern)`: Calls `gemini-2.5-flash` to get an answer and confidence score for a single clue.
-    - `getWordMetadata(word)`: Calls `gemini-2.5-flash` to get metadata (e.g., part of speech, definition) for a given word.
+    - `getWordMetadata(word)`: This is a function declaration provided to the `gemini-2.5-flash` model. When the model invokes this function, the application calls the `getWordMetadataFromApi(word)` method, which queries a public dictionary API (`dictionaryapi.dev`) to retrieve grammatical information for the given word.
 - **`PuzzleSolver`:** Contains the business logic for the main solving loop, iterating through clues and coordinating with the `GeminiService`, `PuzzleDataState`, and `PuzzleSolverState` to solve the puzzle.
 
 ## 5. Puzzle Solving Logic
@@ -67,17 +67,27 @@ The puzzle-solving process is managed by an app-driven loop within the `PuzzleSo
     - **Red:** Two conflicting answers for the same cell.
 5.  **Looping:** The app loops through all unsolved clues until the puzzle is complete, making multiple passes if necessary to retry clues that were previously answered incorrectly.
 
-### Handling Concurrent Operations
+### Handling Function Calls and Structured Output
 
-To provide a responsive user experience and efficiently manage resources, the application must be able to cancel in-flight requests to the Gemini API. This is critical when the user pauses or restarts the solving process. The application achieves this through stream-based API calls and explicit cancellation.
+To ensure robust interaction with the Gemini model for clue solving, the application uses a sophisticated, multi-step process encapsulated within the `_generateJsonWithFunctionsAndSchema` helper method in `GeminiService`. This process is designed to handle both model-driven function calls and a final, strictly-formatted JSON output.
 
--   **Stream-Based Requests:** Instead of using the single-response `generateContent` method, the `GeminiService` uses `streamGenerateContent` for solving clues. This method returns a `Stream` of response chunks.
+-   **Two-Model Approach:** The service uses two configurations of the `gemini-2.5-flash` model:
+    -   `_clueSolverModelWithFunctions`: This model is configured with the `getWordMetadata` tool, allowing it to request additional information during its reasoning process.
+    -   `_clueSolverModelWithSchema`: This model is configured with a strict JSON output schema, ensuring the final answer is always in the correct format (`{ "answer": "...", "confidence": ... }`).
 
--   **Request Cancellation:** When the service listens to this stream, it receives a `StreamSubscription` object. This object acts as a handle to the active API call and has a `cancel()` method.
+-   **Chat-Based Interaction:** The process begins by starting a chat session with `_clueSolverModelWithFunctions`.
+    1.  The initial prompt (clue, length, pattern) is sent.
+    2.  The app checks the model's response for any `functionCalls`.
+    3.  If the model requests a function call (e.g., `getWordMetadata`), the app executes it (by calling the dictionary API) and sends the result back to the model in the same chat session.
+    4.  This loop continues until the model responds with its reasoning complete, without requesting further function calls.
 
--   **Encapsulated Logic:** The `GeminiService` encapsulates this entire mechanism. It holds a reference to the current `_clueSolverSubscription` and exposes a public `cancelCurrentSolve()` method. When this method is called, it cancels the active subscription, which immediately terminates the underlying network request.
+-   **Forcing JSON Output:** Once the function-calling loop is complete, the app takes the entire chat history and uses it to make a final call to the `_clueSolverModelWithSchema`. This effectively asks the model to summarize its final conclusion from the preceding conversation into the required JSON format.
 
--   **State Integration:** The `PuzzleSolverState` calls `geminiService.cancelCurrentSolve()` from its `pauseSolving` and `restartSolving` methods. This ensures that whenever the user interrupts the solver, the active LLM request is instantly canceled, preventing wasted processing and ensuring no stale data can interfere with the new state. The `solveClue` method in the service is designed to handle this cancellation gracefully, returning a `null` value which the `PuzzleSolver` already handles.
+This robust, app-driven process ensures that the model can access external tools when needed while still providing a predictable, machine-readable output for the application to consume.
+
+### Request Cancellation
+
+The `GeminiService` includes a `cancelCurrentSolve()` method, and the `solveClue` method begins with a call to it. However, in the current implementation, the underlying Gemini API calls for clue solving are made via `sendMessage` on a chat, which returns a `Future` and does not support in-flight cancellation. The `cancelCurrentSolve` method is a remnant of a previous, stream-based implementation and currently has no effect. The `PuzzleSolverState` calls this method from its `pauseSolving` and `restartSolving` methods, but it does not interrupt an ongoing `solveClue` operation. A `solveClue` call will always run to completion.
 
 ## 6. Data Models