Skip to content

Conversation

@aidansu
Copy link
Contributor

@aidansu aidansu commented Nov 5, 2025

What problem does this PR solve?

  • Added TCADP Parser configuration fields to PDF, PPT, and spreadsheet parsing forms
  • Implemented support for setting table result type (Markdown/HTML) and Markdown image response type (URL/Text)
  • Updated TCADP Parser to handle return format settings from configuration or parameters
  • Enhanced frontend to dynamically show TCADP options based on selected parsing method
  • Modified backend to pass format parameters when calling TCADP API
  • Optimized form default value logic for TCADP configuration items
  • Updated multilingual resource files for new configuration options

Type of change

  • New Feature (non-breaking change which adds functionality)

- Remove custom signature implementation and adopt Tencent Cloud's official SDK
- Update configuration files to support new SDK parameters
- Upgrade dependencies to the latest stable versions
- Optimize streaming response handling mechanism
- Unify environment variable reading logic
- Enhance control over table and image response types
…add_adp_parser

# Conflicts:
#	rag/app/naive.py
#	rag/flow/parser/parser.py
#	web/src/components/layout-recognize-form-field.tsx
- Add spreadsheet parsing field component in data flow and agent forms
- Update spreadsheet parsing constant configurations to support both DeepDOC and TCADP parsing methods
- Implement TCADP parsing logic for spreadsheet files in rag/app/naive.py
- Extend rag/flow/parser/parser.py to support both TCADP and DeepDOC spreadsheet parsing methods
- Add handling of TCADP parsing results for HTML, JSON, and Markdown output formats
- Update frontend utility functions to pass spreadsheet parsing method configurations
- Add tcadp_parser method for PPT files
- Support both PPT and PPTX file formats
- Add PPT form field component
- Add new output format options: markdown, text, and html
…mance/perf_tcadp_parser

# Conflicts:
#	rag/app/naive.py
#	rag/flow/parser/parser.py
#	uv.lock
#	web/src/components/layout-recognize-form-field.tsx
#	web/src/pages/data-flow/constant.tsx
#	web/src/pages/data-flow/form/parser-form/index.tsx
#	web/src/pages/data-flow/form/parser-form/pdf-form-fields.tsx
#	web/src/pages/data-flow/utils.ts
… for the TCADP Parser

- Added TCADP Parser-related configuration fields to the PDF, PPT, and spreadsheet parsing forms
- Added support for setting table result type (Markdown/HTML) and Markdown image response type (URL/Text)
- Updated the TCADP Parser to support obtaining return format settings from configuration or parameters
- Updated frontend logic to dynamically display TCADP configuration options based on the selected parsing method
- Modified backend logic to pass the corresponding format configuration parameters when calling the TCADP API
- Optimized the form default value setting logic to ensure TCADP configuration items have appropriate initial values
- Updated multilingual resource files to support the UI display of the new configuration items
…nce/perf_tcadp_parser

# Conflicts:
#	rag/app/naive.py
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. 💞 feature Feature request, pull request that fullfill a new feature. labels Nov 5, 2025
@yingfeng yingfeng added the ci Continue Integration label Nov 5, 2025
@yingfeng
Copy link
Member

yingfeng commented Nov 6, 2025

Thx, please fix the ci at first~~

@aidansu
Copy link
Contributor Author

aidansu commented Nov 6, 2025

@yingfeng CI issues fixed, checks are passing now. Please review again. Thanks!

@KevinHuSh KevinHuSh merged commit 420c971 into infiniflow:main Nov 20, 2025
1 check passed
yngvarhuang pushed a commit to yngvarhuang/ragflow that referenced this pull request Nov 20, 2025
…1120

* main: (53 commits)
  Use array syntax for commands in docker-compose-base.yml (infiniflow#11391)
  Feature (canvas): Add mind tagging support (infiniflow#11359)
  locale en add russian language option (infiniflow#11392)
  Locale: update russian language (infiniflow#11393)
  Feat: Add TCADP parser for PPTX and spreadsheet document types. (infiniflow#11041)
  fix(llm): handle None response in total_token_count_from_response (infiniflow#10941)
  feat: add OceanBase doc engine (infiniflow#11228)
  fix cohere rerank  base_url default (infiniflow#11353)
  Feat: Fixed an issue where modifying fields in the agent operator caused the loss of structured data. infiniflow#10427 (infiniflow#11388)
  Docs: minor (infiniflow#11385)
  Doc: Optimize read me (infiniflow#11386)
  Fix some multilingual issues (infiniflow#11382)
  Feat: If a query variable in a data manipulation operator is deleted, a warning message should be displayed to the user.  infiniflow#10427 infiniflow#11255 (infiniflow#11384)
  Fix: refine error msg. (infiniflow#11380)
  Doc: Added v0.22.1 release notes (infiniflow#11383)
  Feat: The key for the begin operator can only contain alphanumeric characters and underscores.  infiniflow#10427 (infiniflow#11377)
  Fix: circle imports issue. (infiniflow#11374)
  Feat: Structured data will still be stored in outputs for compatibility with older versions. infiniflow#10427 (infiniflow#11368)
  Add release notes (infiniflow#11372)
  Update README for supporting Gemini 3 Pro (infiniflow#11369)
  ...

# Conflicts:
#	pyproject.toml
#	web/src/locales/ru.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Continue Integration 💞 feature Feature request, pull request that fullfill a new feature. size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants