A Python CLI for pulling 1-minute intraday OHLCV (Open, High, Low, Close, Volume) data from the EODHD API for US-listed stocks, ETFs, and ADRs.
- Three operating modes: full historical backfill, incremental daily updates, and on-demand single-symbol pulls
- Resumable runs: checkpoint-based progress tracking survives interruptions
- Multi-threaded fetching: configurable concurrency (1-50 workers)
- Rate limit management: local counting with periodic server verification, adaptive throttling near quota limits
- Corporate action adjustments: split and dividend adjustment factors computed and versioned with epoch tracking
- Point-in-time eligibility filtering: filters symbols by historical market cap and average daily volume
- Parquet storage: atomic writes with embedded metadata, deduplication on append
- Python 3.11
- Conda (recommended) or pip
- An EODHD API key
git clone <repo-url>
cd eod_data_pullconda env create -f environment.yml
conda activate eodhd_dataOr install with pip:
pip install pandas requests pyarrow pyyaml clickCreate a file called .api_key in the project root containing your EODHD API token:
echo "your_api_key_here" > .api_keyAlternatively, set the key directly in your config file.
Copy the example config and customize it:
cp config.example.yaml config.yamlpip install pre-commit
pre-commit installThe pipeline is configured via a YAML file. See config.example.yaml for the full reference:
api:
key: ".api_key" # API key or path to file containing it
max_workers: 20 # Thread pool concurrency (1-50)
storage:
data_dir: "./data"
cache_dir: "./cache"
history:
start_date: "2020-01-01"
filters:
min_market_cap: 500000000 # $500M minimum
min_daily_volume: 100000 # 100K shares/day minimum
security_types:
- "Common Stock"
- "ETF"
- "ADR"
logging:
level: "INFO"
log_dir: "./logs"
json_format: true
mode: "backfill" # backfill | daily | symbol
symbols: [] # Required when mode is "symbol"
dry_run: false| Mode | Description |
|---|---|
backfill |
Full historical pull from start_date to yesterday. Includes delisted symbols. Evaluates point-in-time eligibility (market cap + volume). |
daily |
Incremental update from last stored date to yesterday. Active symbols only. |
symbol |
Pull specific symbols listed in the symbols config field. |
The api.key field accepts:
- A literal API key string
- An absolute file path containing the key
- A relative file path (resolved relative to the config file's directory)
python -m src --config config.yamlSet dry_run: true in your config, or for backfill mode it will log estimated API call counts.
Quick test with 5 default symbols, 30 days of history, and 2 workers:
python -m src --config config.yaml --testWith custom symbols:
python -m src --config config.yaml --test --test-symbols NVDA --test-symbols AMDUsage: python -m src [OPTIONS]
Options:
--config PATH Path to YAML config file (required)
--test Test mode: 5 symbols, 30 days, 2 workers
--test-symbols TEXT Symbols for test mode (repeatable)
--help Show this message and exit
data/
├── intraday/ # 1-minute OHLCV, one Parquet file per CUSIP
│ └── {CUSIP}.parquet
├── adjustments/
│ ├── splits/{CUSIP}.parquet
│ └── dividends/{CUSIP}.parquet
└── metadata/
├── adjustment_epochs.parquet
└── cusip_mapping.parquet
cache/
├── progress_{run_id}.json # Resumable run progress
├── universe_{date}.parquet # Daily cached symbol universe
└── eligibility/{CUSIP}.json # Cached eligibility ranges
| Column | Type | Description |
|---|---|---|
timestamp |
timestamp[ns, UTC] | Bar timestamp |
open_raw |
float64 | Unadjusted open price |
high_raw |
float64 | Unadjusted high price |
low_raw |
float64 | Unadjusted low price |
close_raw |
float64 | Unadjusted close price |
volume_raw |
int64 | Unadjusted volume |
adj_factor_price |
float64 | Cumulative price adjustment factor |
adj_factor_volume |
float64 | Cumulative volume adjustment factor |
adjustment_epoch |
int32 | Version of adjustment factors |
To get adjusted prices: adjusted_close = close_raw * adj_factor_price
Install test dependencies (included in environment.yml):
pip install pytest pytest-cov responses freezegunRun the full test suite:
pytestRun with coverage:
pytest --cov=src --cov-report=term-missingRun specific test categories:
pytest tests/unit/ # Unit tests only
pytest tests/integration/ # Integration tests only
pytest -m thread_safety # Thread safety testssrc/
├── cli.py # Click CLI entry point
├── config.py # YAML config loading and validation
├── logging_config.py # Structured JSON logging
├── api/
│ ├── client.py # HTTP client with retry/backoff
│ ├── endpoints.py # EODHD API endpoint wrappers
│ └── rate_limiter.py # Thread-safe quota tracking
├── data/
│ ├── schemas.py # PyArrow Parquet schemas
│ ├── parquet_store.py# Atomic read/write/append
│ └── adjustments.py # Split/dividend factor computation
├── checkpoint/
│ └── progress.py # Resumable run tracking
└── pipeline/
├── chunker.py # Date range chunking (120-day windows)
├── common.py # Shared fetch/store helpers
├── backfill.py # Full historical backfill
├── daily.py # Incremental daily updates
├── symbol.py # Single-symbol pulls
├── universe.py # Symbol universe construction
└── eligibility.py # Market cap/volume filtering