Skip to content

Conversation

@chaosyaan
Copy link
Contributor

Related Issues or Context

  • Incorrect Token Counting for Gemini

The code accumulates completion_tokens for every chunk when processing Gemini's candidates_token_count:

if chunk.usage_metadata:
   completion_tokens += (
       chunk.usage_metadata.candidates_token_count or 0
   )

However, Gemini's candidates token count is already incrementally increasing with each chunk. The current implementation causes duplicate counting, leading to an explosion of token counts.
image

This PR contains Changes to Non-Plugin

  • Documentation
  • Other

This PR contains Changes to Non-LLM Models Plugin

  • I have Run Comprehensive Tests Relevant to My Changes

This PR contains Changes to LLM Models Plugin

  • My Changes Affect Message Flow Handling (System Messages and User→Assistant Turn-Taking)
  • My Changes Affect Tool Interaction Flow (Multi-Round Usage and Output Handling, for both Agent App and Agent Node)
  • My Changes Affect Multimodal Input Handling (Images, PDFs, Audio, Video, etc.)
  • My Changes Affect Multimodal Output Generation (Images, Audio, Video, etc.)
  • My Changes Affect Structured Output Format (JSON, XML, etc.)
  • My Changes Affect Token Consumption Metrics
  • My Changes Affect Other LLM Functionalities (Reasoning Process, Grounding, Prompt Caching, etc.)
  • Other Changes (Add New Models, Fix Model Parameters etc.)

Version Control (Any Changes to the Plugin Will Require Bumping the Version)

  • I have Bumped Up the Version in Manifest.yaml (Top-Level Version Field, Not in Meta Section)

Dify Plugin SDK Version

  • I have Ensured dify_plugin>=0.3.0,<0.5.0 is in requirements.txt (SDK docs)

Environment Verification (If Any Code Changes)

Local Deployment Environment

  • Dify Version is: , I have Tested My Changes on Local Deployment Dify with a Clean Environment That Matches the Production Configuration.

SaaS Environment

  • [] I have Tested My Changes on cloud.dify.ai with a Clean Environment That Matches the Production Configuration

@fdb02983rhy
Copy link
Contributor

fdb02983rhy commented Jul 27, 2025

Please fill the template and bump up the plugin ver.
Also it will be nice to provide an evidence like the test example below, showing that the token counting in Dify matches that in GCP.

@chaosyaan
Copy link
Contributor Author

chaosyaan commented Jul 28, 2025

gemini plugin version: 0.2.9
prompt: 随意输出问题,但是要保证结果是1000个token

  • official plugin, gemini-2.5-flash and 0 thinking budget.
image - modified local plugin, gemini-2.5-flash and 128 thinking budget. image

@fdb02983rhy
Copy link
Contributor

gemini plugin version: 0.2.9

prompt: 随意输出问题,但是要保证结果是1000个token

  • official plugin, gemini-2.5-flash and 0 thinking budget.
image
  • modified local plugin, gemini-2.5-flash and 128 thinking budget.
image

Could you compare them with the result on your GCP console? LLM has no clue counting words or tokens.

https://console.cloud.google.com/apis/api/generativelanguage.googleapis.com

image

@chaosyaan
Copy link
Contributor Author

I can't find any data about tokens on this page. Maybe it's on the cost page, but I can't access it with my personal account, and it's not convenient to check with the company account.

I can show the results from direct API calls using the same prompts, that the token count is reasonable at around 1000. The problematic version appears to be summing up all the candidates_token_count values.
image

@QIN2DIM QIN2DIM mentioned this pull request Aug 12, 2025
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants