🌐 Try it online: https://boleto-extractor-xrva.onrender.com/
A Python tool to extract boleto numbers from Brazilian boleto PDF files. This tool reads 44-digit barcodes from PDFs and converts them to the standard 47-digit "linha digitável" format used for payments. The tool always returns 47-digit numbers ready for payment processing.
- Barcode-Focused Extraction: Primarily reads 44-digit barcodes from PDFs
- Automatic Conversion: Converts 44-digit barcodes to 47-digit "linha digitável" using official algorithms
- Multiple Extraction Methods: Barcode scanning, text extraction, and raw PDF content analysis
- Barcode Detection: Reads Interleaved 2 of 5 (I25) and other common barcode formats
- PDF Processing: Handles both text-based and image-based PDFs
- Encrypted PDF Support: Attempts to decrypt password-protected PDFs
- Brazilian Format Output: Option to display boleto numbers in proper Brazilian "linha digitável" format
- Clipboard Copy: Option to copy boleto numbers directly to clipboard for easy pasting
- Clean Output: No numbering or extra formatting in results
- Python 3.7 or higher
- macOS, Linux, or Windows
-
Install from GitHub:
pip install git+https://github.com/pedrinho/boleto_extractor.git
-
Install zbar (for barcode reading):
- macOS:
brew install zbar - Ubuntu/Debian:
sudo apt-get install libzbar0 - Windows: Download from zbar releases
- macOS:
-
Clipboard functionality: The clipboard copy feature requires
pyperclipwhich is automatically installed with the package.
-
Clone or download this repository:
git clone https://github.com/pedrinho/boleto_extractor.git cd boleto_extractor -
Create a virtual environment (recommended):
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install the package:
pip install -e . -
Install zbar (for barcode reading):
- macOS:
brew install zbar - Ubuntu/Debian:
sudo apt-get install libzbar0 - Windows: Download from zbar releases
- macOS:
-
Clipboard functionality: The clipboard copy feature requires
pyperclipwhich is automatically installed with the package.
from boleto_extractor import BoletoExtractor
# Create extractor and extract boleto numbers
extractor = BoletoExtractor()
numbers = extractor.extract_boleto_numbers("boleto.pdf")
# Extract from encrypted PDF
numbers = extractor.extract_boleto_numbers("encrypted.pdf", password="mypassword")
# Convert 44-digit barcode to 47-digit linha digitável
barcode = "19797116900000386000000004572849356277103564"
linha = extractor.barcode_to_linha_digitavel(barcode)
# Format output with spaces
formatted = extractor.format_boleto_number("19790000050457284935662771035649711690000038600")
# Result: "19790.00005 04572.84935 66277.10356 4 9711690000038600"boleto-extractor path/to/boleto.pdfFound 1 boleto number(s):
--------------------------------------------------
19790000050457284935662771035649711690000038600
--------------------------------------------------
Found 1 boleto number(s):
--------------------------------------------------
19790.00005 04572.84935 66277.10356 4 9711690000038600
--------------------------------------------------
Found 1 boleto number(s):
--------------------------------------------------
19790000050457284935662771035649711690000038600
--------------------------------------------------
✓ Copied to clipboard: 19790000050457284935662771035649711690000038600
Found 1 boleto number(s):
--------------------------------------------------
19790.00005 04572.84935 66277.10356 4 9711690000038600
--------------------------------------------------
✓ Copied to clipboard: 19790.00005 04572.84935 66277.10356 4 9711690000038600
The tool focuses on extracting 44-digit barcodes and converting them to 47-digit "linha digitável":
- Format:
XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX - Example:
19797116900000386000000004572849356277103564 - Source: What infrared pistols/scanners read from the barcode
- Format:
XXXXX.XXXXX XXXXX.XXXXXX XXXXX.XXXXXX X XXXXXXXXXXXXXX - Example:
19790000050457284935662771035649711690000038600 - Usage: Standard format for payments and manual entry
The tool automatically converts 44-digit barcodes to 47-digit "linha digitável" using the official Modulo 10 algorithm for check digit calculation.
The tool recognizes boleto numbers starting with common Brazilian bank codes:
- 001 (Banco do Brasil)
- 033 (Santander)
- 104 (Caixa Econômica Federal)
- 237 (Bradesco)
- 341 (Itaú)
- 356 (Real)
- 389 (Banco Mercantil)
- 422 (Safra)
- 633 (Banco Rendimento)
- 745 (Citibank)
- 756 (Sicoob)
- 197 (Banco Bradesco BBI)
The tool uses a barcode-focused approach to extract boleto numbers:
- Barcode Scanning: Converts PDF pages to images and scans for barcodes using pyzbar
- Text Extraction: As backup, extracts text from PDF and looks for 44-digit barcode patterns
- Raw Content Analysis: For encrypted PDFs, analyzes raw PDF content for barcode patterns
- Validation: Validates found numbers against known Brazilian bank codes
- Conversion: Converts 44-digit barcodes to 47-digit "linha digitável" using Modulo 10 algorithm
- Output: Returns clean 47-digit numbers ready for payment processing
Possible causes:
- PDF is encrypted/password protected
- PDF contains only images with low-quality barcodes
- Barcode is not in a recognized format
- PDF quality is too low for barcode scanning
Solutions:
- Try installing PyMuPDF for better image processing:
pip install PyMuPDF
- Check if the PDF is password protected and provide the password
- Ensure the PDF has clear, high-resolution barcode images
- Try the
--verboseflag for more detailed error information
Possible causes:
- PDF is corrupted
- PDF is encrypted with a password
- PDF format is not supported
Solutions:
- Try opening the PDF in a PDF reader to verify it's not corrupted
- If password protected, provide the correct password
- Convert the PDF to a different format if possible
Possible causes:
- zbar not installed
- Barcode image quality is too low
- Barcode format not supported
Solutions:
- Install zbar library (see installation instructions)
- Ensure the PDF has clear, high-resolution barcode images
- Try the
--verboseflag for more detailed error information
- For large PDFs, the tool may take longer to process
- Use
--verboseflag to see progress and identify bottlenecks - Consider splitting large PDFs into smaller files if possible
Basic usage:
boleto-extractor boleto.pdfWith Brazilian format:
boleto-extractor boleto.pdf --formatCopy to clipboard:
boleto-extractor boleto.pdf --clipboardWith password for encrypted PDFs:
boleto-extractor encrypted.pdf --password mypasswordWith verbose output:
boleto-extractor boleto.pdf --verboseCombined options:
boleto-extractor encrypted.pdf --password mypassword --verbose --format --clipboard- PyPDF2: PDF text extraction
- pdfplumber: Advanced PDF text and image extraction
- PyMuPDF: PDF to image conversion (optional but recommended)
- opencv-python: Image processing
- pyzbar: Barcode reading
- Pillow: Image handling
- numpy: Numerical operations
- pyperclip: Clipboard functionality (for copy to clipboard feature)
Feel free to submit issues, feature requests, or pull requests to improve this tool.
This project is open source and available under the MIT License.
This tool is designed for educational and legitimate business purposes. Always ensure you have the right to process the PDF files you're using with this tool. The authors are not responsible for any misuse of this software.