A Python script that processes data from a PDF file, converts it to CSV format, parses the data, and populates specific sections of a provided financial audit Excel spreadsheet (see "audit.xlsx" file) based on a given command.
- PDF to CSV Conversion: Extracts data from the PDF and saves it as a CSV file.
- Data Parsing: Analyzes and structures the extracted data.
- Excel Population: Fills specific sections of an input Excel spreadsheet based on parsed data and provided commands.
- Python 3.7+
- Required Python libraries:
pandasopenpyxl
Install the dependencies:
pip install pandas openpyxl-
Run the script with the following arguments:
python script.py <command> <data.pdf> <spreadsheet.xlsx>
<command>: Specifies the operation to perform: fillExecs or fillTransactions<data.pdf>: Path to the input PDF file containing the data<spreadsheet.xlsx>: Path to the Excel spreadsheet to be populated (format must be the same as "audit.xlsx")
-
The script will:
- Convert the PDF to CSV
- Parse the CSV data
- Update the specified sections of the Excel spreadsheet
-
The output Excel file will be saved with updated content in the same directory
python script.py fillTransactions data.pdf audit.xlsxThis command converts data.pdf into a CSV, processes the data, and updates the audit.xlsx file based on the fillTransactions command.