Skip to content

Conversation

@QIN2DIM
Copy link
Contributor

@QIN2DIM QIN2DIM commented Aug 12, 2025

Related Issues or Context

Related: #1360

This PR introduces a solution to fully resolve token counting issues for prompt_tokens and completion_tokens in multi-modal QA scenarios using the Gemini GenAI SDK.


1. Accurate Token Counting in Multi-Modal QA

  • Problem:
    In the Gemini GenAI SDK, token statistics for prompt_tokens and completion_tokens were previously inaccurate in multi-modal QA scenarios, leading to inconsistencies in total token usage reporting.
  • Fix:
    Implemented a solution that ensures:
    • total_token_count is always consistent.
    • Correct pricing is applied for the following modalities: IMAGE, VIDEO, TEXT, and DOCUMENT.
  • Limitations:
    Due to current constraints in Dify:
    • Different modalities cannot be assigned distinct token prices.
    • The system does not track tiered (step-wise) pricing.
    • As a result, pricing accuracy cannot be guaranteed in the following scenarios:
      • Caching
      • Grounding
      • Audio input
      • LiveAPI
      • Gemini 2.5 Pro with ultra-long context.
completion_tokens = thoughts_token_count + candidates_token_count
prompt_tokens = prompt_tokens_standard
total_tokens= prompt_tokens + completion_tokens + (tool_use_prompts)
tool_use_prompts = 0
image

This PR contains Changes to Non-Plugin

  • Documentation
  • Other

This PR contains Changes to Non-LLM Models Plugin

  • I have Run Comprehensive Tests Relevant to My Changes

This PR contains Changes to LLM Models Plugin

  • My Changes Affect Message Flow Handling (System Messages and User→Assistant Turn-Taking)
  • My Changes Affect Tool Interaction Flow (Multi-Round Usage and Output Handling, for both Agent App and Agent Node)
  • My Changes Affect Multimodal Input Handling (Images, PDFs, Audio, Video, etc.)
  • My Changes Affect Multimodal Output Generation (Images, Audio, Video, etc.)
  • My Changes Affect Structured Output Format (JSON, XML, etc.)
  • My Changes Affect Token Consumption Metrics
  • My Changes Affect Other LLM Functionalities (Reasoning Process, Grounding, Prompt Caching, etc.)
  • Other Changes (Add New Models, Fix Model Parameters etc.)

Version Control (Any Changes to the Plugin Will Require Bumping the Version)

  • I have Bumped Up the Version in Manifest.yaml (Top-Level Version Field, Not in Meta Section)

Dify Plugin SDK Version

  • I have Ensured dify_plugin>=0.3.0,<0.5.0 is in requirements.txt (SDK docs)

Environment Verification (If Any Code Changes)

Local Deployment Environment

  • Dify Version is: , I have Tested My Changes on Local Deployment Dify with a Clean Environment That Matches the Production Configuration.

SaaS Environment

  • I have Tested My Changes on cloud.dify.ai with a Clean Environment That Matches the Production Configuration

- Improve token counting accuracy by properly handling multimodal input types in usage metadata
- Add detailed token tracking for thoughts and candidates in both streaming and non-streaming responses
- Update google-genai dependency to version 1.29.0 for improved compatibility and features
- Bump version to 0.4.1 to reflect new functionality and improvements
- Add comprehensive comments and usage examples for better code maintainability
Extracted the token calculation logic from multiple locations in the GoogleLargeLanguageModel class into a static method `_calculate_tokens_from_usage_metadata`.

This improves code maintainability and reduces duplication across different parts of the codebase that handle token counting.

The new method:
- Handles all token types including text, image, video, audio, and document modalities
- Correctly calculates prompt and completion tokens based on usage metadata
- Includes proper fallback to manual token counting when metadata is unavailable

The change also removes redundant comments and improves code readability while preserving all existing functionality.
Updated the google-genai dependency from version 1.27.0 to 1.29.0 to include the latest bug fixes and improvements. This ensures compatibility with the latest features and security patches from the Google Generative AI library.
@crazywoola crazywoola merged commit ae5fd2d into langgenius:main Aug 12, 2025
1 check passed
Frederick2313072 pushed a commit that referenced this pull request Aug 28, 2025
* feat(gemini): enhance token usage tracking and update dependencies

- Improve token counting accuracy by properly handling multimodal input types in usage metadata
- Add detailed token tracking for thoughts and candidates in both streaming and non-streaming responses
- Update google-genai dependency to version 1.29.0 for improved compatibility and features
- Bump version to 0.4.1 to reflect new functionality and improvements
- Add comprehensive comments and usage examples for better code maintainability

* refactor(llm): extract token calculation logic into helper method

Extracted the token calculation logic from multiple locations in the GoogleLargeLanguageModel class into a static method `_calculate_tokens_from_usage_metadata`.

This improves code maintainability and reduces duplication across different parts of the codebase that handle token counting.

The new method:
- Handles all token types including text, image, video, audio, and document modalities
- Correctly calculates prompt and completion tokens based on usage metadata
- Includes proper fallback to manual token counting when metadata is unavailable

The change also removes redundant comments and improves code readability while preserving all existing functionality.

* Update llm.py

* fix(gemini): update google-genai to version 1.29.0

Updated the google-genai dependency from version 1.27.0 to 1.29.0 to include the latest bug fixes and improvements. This ensures compatibility with the latest features and security patches from the Google Generative AI library.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants