-
-
Notifications
You must be signed in to change notification settings - Fork 839
fix: suppress pdfminer warnings to prevent upload halting #463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: suppress pdfminer warnings to prevent upload halting #463
Conversation
- Added warning suppression for pdfminer warnings during Docling PDF processing - Suppresses 'Cannot set gray non-stroke color' warnings that cause uploads to halt - Temporarily sets pdfminer logger to ERROR level during document processing - Fixes issue where files ~34MB would fail due to pdfminer warning spam Resolves issue where PDF uploads would halt with repeated pdfminer warnings
|
@codeBunny2022 is attempting to deploy a commit to the Rohan Verma's projects Team on Vercel. A member of the Team first needs to authorize it. |
WalkthroughThe changes add a suppression block around Docling PDF processing to silence pdfminer warnings and temporarily set its logger to ERROR, and introduce build-system and setuptools configuration in pyproject.toml for packaging metadata. Changes
Sequence DiagramsequenceDiagram
participant Caller
participant FileProcessor
participant Warnings
participant PdfminerLogger
participant DoclingService
Caller->>FileProcessor: process_document(...)
rect rgb(245, 250, 255)
Note over FileProcessor,DoclingService: Suppress pdfminer warnings & raise logger level
FileProcessor->>Warnings: filterwarnings(ignore, category=UserWarning, message=pdfminer...)
FileProcessor->>PdfminerLogger: setLevel(ERROR) (save old level)
FileProcessor->>DoclingService: process_document(...)
DoclingService-->>FileProcessor: result / error
FileProcessor->>PdfminerLogger: restore(saved level)
end
FileProcessor-->>Caller: return result
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)surfsense_backend/app/tasks/document_processors/file_processors.py (1)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by RecurseML
🔍 Review performed on 57fd82f..094bdfa
✨ No bugs found, your code is sparkling clean
✅ Files analyzed, no issues (1)
• surfsense_backend/pyproject.toml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MODSetter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove pyproject.toml changes and everything else looks good to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@codeBunny2022 This is causing docker build to fail. Please revert these changes.
Err Log
10.68 Downloading av (38.7MiB)
12.60 × Failed to build `surf-new-backend @ file:///app`
12.60 ├─▶ The build backend returned an error
12.60 ╰─▶ Call to `setuptools.build_meta.build_editable` failed (exit status: 1)
12.60
12.60 [stdout]
12.60 running egg_info
12.60 creating surf_new_backend.egg-info
12.60 writing surf_new_backend.egg-info/PKG-INFO
12.60 writing dependency_links to
12.60 surf_new_backend.egg-info/dependency_links.txt
12.60 writing requirements to surf_new_backend.egg-info/requires.txt
12.60 writing top-level names to surf_new_backend.egg-info/top_level.txt
12.60 writing manifest file 'surf_new_backend.egg-info/SOURCES.txt'
12.60
12.60 [stderr]
12.60 /tmp/.tmp7SLZ0s/builds-v0/.tmprY3Eyn/lib/python3.12/site-packages/setuptools/config/expand.py:128:
12.60 SetuptoolsWarning: File '/app/README.md' cannot be found
12.60 for path in _filter_existing_files(_filepaths)
12.60 error: package directory 'app' does not exist
12.60
12.60 hint: This usually indicates a problem with the package or the build
12.60 environment.
------
failed to solve: process "/bin/sh -c pip install --no-cache-dir uv && uv pip install --system --no-cache-dir -e ." did not complete successfully: exit code: 1
|
i have reverted the changes in pyproject.toml, it should be good to go now |
|
@codeBunny2022 Thanks for your work 👍 |
Problem
PDF uploads were halting when processing files, especially around 34MB in size, due to repeated pdfminer warnings:
These warnings would spam the logs and cause the upload process to hang or fail, requiring users to delete and reupload files.
Solution
Changes
surfsense_backend/app/tasks/document_processors/file_processors.pyto wrap Docling processing with warning suppressionpyproject.tomlsetuptools configuration for proper package discoveryTesting
Fixes issue where PDF files around 34MB would fail to upload due to pdfminer warning spam.
High-level PR Summary
This PR fixes an issue where PDF uploads were failing or hanging when processing files around 34MB in size due to excessive warning messages from the
pdfminerlibrary. The solution wraps the Docling PDF processing with warning suppression logic that temporarily filters out specific harmless warnings ('Cannot set gray non-stroke color' and 'invalid float value') and raises the pdfminer logger level to ERROR during processing, then restores the original logging level afterwards. The PR also includes an unrelated fix to thepyproject.tomlsetuptools configuration for proper package discovery.⏱️ Estimated Review Time: 5-15 minutes
💡 Review Order Suggestion
surfsense_backend/app/tasks/document_processors/file_processors.pysurfsense_backend/pyproject.tomlsurfsense_backend/pyproject.tomlSummary by CodeRabbit
Improvements
Chores