Brazilian Boleto Number Extractor

🌐 Try it online: https://boleto-extractor-xrva.onrender.com/

A Python tool to extract boleto numbers from Brazilian boleto PDF files. This tool reads 44-digit barcodes from PDFs and converts them to the standard 47-digit "linha digitável" format used for payments. The tool always returns 47-digit numbers ready for payment processing.

Features

Barcode-Focused Extraction: Primarily reads 44-digit barcodes from PDFs
Automatic Conversion: Converts 44-digit barcodes to 47-digit "linha digitável" using official algorithms
Multiple Extraction Methods: Barcode scanning, text extraction, and raw PDF content analysis
Barcode Detection: Reads Interleaved 2 of 5 (I25) and other common barcode formats
PDF Processing: Handles both text-based and image-based PDFs
Encrypted PDF Support: Attempts to decrypt password-protected PDFs
Brazilian Format Output: Option to display boleto numbers in proper Brazilian "linha digitável" format
Clipboard Copy: Option to copy boleto numbers directly to clipboard for easy pasting
Clean Output: No numbering or extra formatting in results

Installation

Prerequisites

Python 3.7 or higher
macOS, Linux, or Windows

Setup

Option 1: Install from GitHub (Recommended)

Install from GitHub:

pip install git+https://github.com/pedrinho/boleto_extractor.git

Install zbar (for barcode reading):
- macOS: brew install zbar
- Ubuntu/Debian: sudo apt-get install libzbar0
- Windows: Download from zbar releases
Clipboard functionality: The clipboard copy feature requires pyperclip which is automatically installed with the package.

Option 2: Install from Source

Clone or download this repository:

git clone https://github.com/pedrinho/boleto_extractor.git
cd boleto_extractor

Create a virtual environment (recommended):

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the package:
```
pip install -e .
```
Install zbar (for barcode reading):
- macOS: brew install zbar
- Ubuntu/Debian: sudo apt-get install libzbar0
- Windows: Download from zbar releases
Clipboard functionality: The clipboard copy feature requires pyperclip which is automatically installed with the package.

Usage

As a Library

from boleto_extractor import BoletoExtractor

# Create extractor and extract boleto numbers
extractor = BoletoExtractor()
numbers = extractor.extract_boleto_numbers("boleto.pdf")

# Extract from encrypted PDF
numbers = extractor.extract_boleto_numbers("encrypted.pdf", password="mypassword")

# Convert 44-digit barcode to 47-digit linha digitável
barcode = "19797116900000386000000004572849356277103564"
linha = extractor.barcode_to_linha_digitavel(barcode)

# Format output with spaces
formatted = extractor.format_boleto_number("19790000050457284935662771035649711690000038600")
# Result: "19790.00005 04572.84935 66277.10356 4 9711690000038600"

Command Line Interface

boleto-extractor path/to/boleto.pdf

Output Examples

Standard Output

Found 1 boleto number(s):
--------------------------------------------------
19790000050457284935662771035649711690000038600
--------------------------------------------------

Formatted Output

Found 1 boleto number(s):
--------------------------------------------------
19790.00005 04572.84935 66277.10356 4 9711690000038600
--------------------------------------------------

Clipboard Output

Found 1 boleto number(s):
--------------------------------------------------
19790000050457284935662771035649711690000038600
--------------------------------------------------
✓ Copied to clipboard: 19790000050457284935662771035649711690000038600

Formatted Output with Clipboard

Found 1 boleto number(s):
--------------------------------------------------
19790.00005 04572.84935 66277.10356 4 9711690000038600
--------------------------------------------------
✓ Copied to clipboard: 19790.00005 04572.84935 66277.10356 4 9711690000038600

Supported Boleto Formats

The tool focuses on extracting 44-digit barcodes and converting them to 47-digit "linha digitável":

44-Digit Barcode (Input)

Format: XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX
Example: 19797116900000386000000004572849356277103564
Source: What infrared pistols/scanners read from the barcode

47-Digit Linha Digitável (Output)

Format: XXXXX.XXXXX XXXXX.XXXXXX XXXXX.XXXXXX X XXXXXXXXXXXXXX
Example: 19790000050457284935662771035649711690000038600
Usage: Standard format for payments and manual entry

Conversion Process

The tool automatically converts 44-digit barcodes to 47-digit "linha digitável" using the official Modulo 10 algorithm for check digit calculation.

Supported Bank Codes

The tool recognizes boleto numbers starting with common Brazilian bank codes:

001 (Banco do Brasil)
033 (Santander)
104 (Caixa Econômica Federal)
237 (Bradesco)
341 (Itaú)
356 (Real)
389 (Banco Mercantil)
422 (Safra)
633 (Banco Rendimento)
745 (Citibank)
756 (Sicoob)
197 (Banco Bradesco BBI)

How It Works

The tool uses a barcode-focused approach to extract boleto numbers:

Barcode Scanning: Converts PDF pages to images and scans for barcodes using pyzbar
Text Extraction: As backup, extracts text from PDF and looks for 44-digit barcode patterns
Raw Content Analysis: For encrypted PDFs, analyzes raw PDF content for barcode patterns
Validation: Validates found numbers against known Brazilian bank codes
Conversion: Converts 44-digit barcodes to 47-digit "linha digitável" using Modulo 10 algorithm
Output: Returns clean 47-digit numbers ready for payment processing

Troubleshooting

Common Issues

"No boleto numbers found"

Possible causes:

PDF is encrypted/password protected
PDF contains only images with low-quality barcodes
Barcode is not in a recognized format
PDF quality is too low for barcode scanning

Solutions:

Try installing PyMuPDF for better image processing:
```
pip install PyMuPDF
```
Check if the PDF is password protected and provide the password
Ensure the PDF has clear, high-resolution barcode images
Try the --verbose flag for more detailed error information

"Error extracting text from PDF"

Possible causes:

PDF is corrupted
PDF is encrypted with a password
PDF format is not supported

Solutions:

Try opening the PDF in a PDF reader to verify it's not corrupted
If password protected, provide the correct password
Convert the PDF to a different format if possible

"Error scanning barcodes"

Possible causes:

zbar not installed
Barcode image quality is too low
Barcode format not supported

Solutions:

Install zbar library (see installation instructions)
Ensure the PDF has clear, high-resolution barcode images
Try the --verbose flag for more detailed error information

Performance Tips

For large PDFs, the tool may take longer to process
Use --verbose flag to see progress and identify bottlenecks
Consider splitting large PDFs into smaller files if possible

Usage Examples

Basic usage:

boleto-extractor boleto.pdf

With Brazilian format:

boleto-extractor boleto.pdf --format

Copy to clipboard:

boleto-extractor boleto.pdf --clipboard

With password for encrypted PDFs:

boleto-extractor encrypted.pdf --password mypassword

With verbose output:

boleto-extractor boleto.pdf --verbose

Combined options:

boleto-extractor encrypted.pdf --password mypassword --verbose --format --clipboard

Dependencies

PyPDF2: PDF text extraction
pdfplumber: Advanced PDF text and image extraction
PyMuPDF: PDF to image conversion (optional but recommended)
opencv-python: Image processing
pyzbar: Barcode reading
Pillow: Image handling
numpy: Numerical operations
pyperclip: Clipboard functionality (for copy to clipboard feature)

Contributing

Feel free to submit issues, feature requests, or pull requests to improve this tool.

License

This project is open source and available under the MIT License.

Disclaimer

This tool is designed for educational and legitimate business purposes. Always ensure you have the right to process the PDF files you're using with this tool. The authors are not responsible for any misuse of this software.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
boleto_extractor		boleto_extractor
static		static
templates		templates
tests		tests
.coverage		.coverage
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.simple		Dockerfile.simple
Procfile		Procfile
README.md		README.md
README_WEB.md		README_WEB.md
app.py		app.py
env.example		env.example
pytest.ini		pytest.ini
railway.json		railway.json
requirements.txt		requirements.txt
runtime.txt		runtime.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Brazilian Boleto Number Extractor

Features

Installation

Prerequisites

Setup

Option 1: Install from GitHub (Recommended)

Option 2: Install from Source

Usage

As a Library

Command Line Interface

Output Examples

Standard Output

Formatted Output

Clipboard Output

Formatted Output with Clipboard

Supported Boleto Formats

44-Digit Barcode (Input)

47-Digit Linha Digitável (Output)

Conversion Process

Supported Bank Codes

How It Works

Troubleshooting

Common Issues

"No boleto numbers found"

"Error extracting text from PDF"

"Error scanning barcodes"

Performance Tips

Usage Examples

Dependencies

Contributing

License

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages