An ETL for the Mozilla Organization Firefox repositories
This repository contains a Python-based ETL (Extract, Transform, Load) script designed to process data from Mozilla Organization Firefox repositories on GitHub. The application runs in a Docker container for easy deployment and isolation.
- Containerized: Runs in a Docker container using the latest stable Python
- Secure: Runs as a non-root user (
app) inside the container - Structured: Follows ETL patterns with separate extract, transform, and load phases
- Logging: Comprehensive logging for monitoring and debugging
docker build -t github-etl .docker run --rm github-etlmain.py: The main ETL script containing the business logicrequirements.txt: Python dependenciesDockerfile: Container configuration
- Base Image:
python:3.11-slim(latest stable Python) - User:
app(uid: 1000, gid: 1000) - Working Directory:
/app - Ownership: All files in
/appare owned by theappuser
- Extract: Retrieves data from GitHub repositories
- Transform: Processes and structures the data
- Load: Stores the processed data in the target destination
You can run the script directly with Python:
python3 main.pyAdd new Python packages to requirements.txt and rebuild the Docker image.
This project is licensed under the Mozilla Public License Version 2.0. See the LICENSE file for details.