fix(gemini): usage metadata #1491

QIN2DIM · 2025-08-12T05:59:13Z

Related Issues or Context

Related: #1360

This PR introduces a solution to fully resolve token counting issues for prompt_tokens and completion_tokens in multi-modal QA scenarios using the Gemini GenAI SDK.

1. Accurate Token Counting in Multi-Modal QA

Problem:
In the Gemini GenAI SDK, token statistics for prompt_tokens and completion_tokens were previously inaccurate in multi-modal QA scenarios, leading to inconsistencies in total token usage reporting.
Fix:
Implemented a solution that ensures:
- total_token_count is always consistent.
- Correct pricing is applied for the following modalities: IMAGE, VIDEO, TEXT, and DOCUMENT.
Limitations:
Due to current constraints in Dify:
- Different modalities cannot be assigned distinct token prices.
- The system does not track tiered (step-wise) pricing.
- As a result, pricing accuracy cannot be guaranteed in the following scenarios:
  - Caching
  - Grounding
  - Audio input
  - LiveAPI
  - Gemini 2.5 Pro with ultra-long context.

completion_tokens = thoughts_token_count + candidates_token_count
prompt_tokens = prompt_tokens_standard
total_tokens= prompt_tokens + completion_tokens + (tool_use_prompts)
tool_use_prompts = 0

This PR contains Changes to Non-Plugin

Documentation
Other

This PR contains Changes to Non-LLM Models Plugin

I have Run Comprehensive Tests Relevant to My Changes

This PR contains Changes to LLM Models Plugin

My Changes Affect Message Flow Handling (System Messages and User→Assistant Turn-Taking)

My Changes Affect Tool Interaction Flow (Multi-Round Usage and Output Handling, for both Agent App and Agent Node)

My Changes Affect Multimodal Input Handling (Images, PDFs, Audio, Video, etc.)

My Changes Affect Multimodal Output Generation (Images, Audio, Video, etc.)

My Changes Affect Structured Output Format (JSON, XML, etc.)

My Changes Affect Token Consumption Metrics

My Changes Affect Other LLM Functionalities (Reasoning Process, Grounding, Prompt Caching, etc.)

Other Changes (Add New Models, Fix Model Parameters etc.)

Version Control (Any Changes to the Plugin Will Require Bumping the Version)

I have Bumped Up the Version in Manifest.yaml (Top-Level Version Field, Not in Meta Section)

Dify Plugin SDK Version

I have Ensured dify_plugin>=0.3.0,<0.5.0 is in requirements.txt (SDK docs)

Environment Verification (If Any Code Changes)

Local Deployment Environment

Dify Version is: , I have Tested My Changes on Local Deployment Dify with a Clean Environment That Matches the Production Configuration.

SaaS Environment

I have Tested My Changes on cloud.dify.ai with a Clean Environment That Matches the Production Configuration

- Improve token counting accuracy by properly handling multimodal input types in usage metadata - Add detailed token tracking for thoughts and candidates in both streaming and non-streaming responses - Update google-genai dependency to version 1.29.0 for improved compatibility and features - Bump version to 0.4.1 to reflect new functionality and improvements - Add comprehensive comments and usage examples for better code maintainability

Extracted the token calculation logic from multiple locations in the GoogleLargeLanguageModel class into a static method `_calculate_tokens_from_usage_metadata`. This improves code maintainability and reduces duplication across different parts of the codebase that handle token counting. The new method: - Handles all token types including text, image, video, audio, and document modalities - Correctly calculates prompt and completion tokens based on usage metadata - Includes proper fallback to manual token counting when metadata is unavailable The change also removes redundant comments and improves code readability while preserving all existing functionality.

Updated the google-genai dependency from version 1.27.0 to 1.29.0 to include the latest bug fixes and improvements. This ensures compatibility with the latest features and security patches from the Google Generative AI library.

* feat(gemini): enhance token usage tracking and update dependencies - Improve token counting accuracy by properly handling multimodal input types in usage metadata - Add detailed token tracking for thoughts and candidates in both streaming and non-streaming responses - Update google-genai dependency to version 1.29.0 for improved compatibility and features - Bump version to 0.4.1 to reflect new functionality and improvements - Add comprehensive comments and usage examples for better code maintainability * refactor(llm): extract token calculation logic into helper method Extracted the token calculation logic from multiple locations in the GoogleLargeLanguageModel class into a static method `_calculate_tokens_from_usage_metadata`. This improves code maintainability and reduces duplication across different parts of the codebase that handle token counting. The new method: - Handles all token types including text, image, video, audio, and document modalities - Correctly calculates prompt and completion tokens based on usage metadata - Includes proper fallback to manual token counting when metadata is unavailable The change also removes redundant comments and improves code readability while preserving all existing functionality. * Update llm.py * fix(gemini): update google-genai to version 1.29.0 Updated the google-genai dependency from version 1.27.0 to 1.29.0 to include the latest bug fixes and improvements. This ensures compatibility with the latest features and security patches from the Google Generative AI library.

QIN2DIM added 4 commits August 12, 2025 13:05

Update llm.py

0bfa25b

fix(gemini): update google-genai to version 1.29.0

0511cea

Updated the google-genai dependency from version 1.27.0 to 1.29.0 to include the latest bug fixes and improvements. This ensures compatibility with the latest features and security patches from the Google Generative AI library.

crazywoola approved these changes Aug 12, 2025

View reviewed changes

crazywoola merged commit ae5fd2d into langgenius:main Aug 12, 2025
1 check passed

dosubot bot mentioned this pull request Aug 22, 2025

Fix token calculation metadata in CoT agent strategy #1564

Closed

dosubot bot mentioned this pull request Sep 29, 2025

[Agent] Gemini INVALID_ARGUMENT in Agent langgenius/dify#26466

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(gemini): usage metadata #1491

fix(gemini): usage metadata #1491

Uh oh!

QIN2DIM commented Aug 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(gemini): usage metadata #1491

fix(gemini): usage metadata #1491

Uh oh!

Conversation

QIN2DIM commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues or Context

1. Accurate Token Counting in Multi-Modal QA

This PR contains Changes to Non-Plugin

This PR contains Changes to Non-LLM Models Plugin

This PR contains Changes to LLM Models Plugin

Version Control (Any Changes to the Plugin Will Require Bumping the Version)

Dify Plugin SDK Version

Environment Verification (If Any Code Changes)

Local Deployment Environment

SaaS Environment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

QIN2DIM commented Aug 12, 2025 •

edited

Loading