Skip to content

HammySession/eodhd_data_pull

Repository files navigation

EODHD 1-Minute Intraday OHLCV Data Pipeline

A Python CLI for pulling 1-minute intraday OHLCV (Open, High, Low, Close, Volume) data from the EODHD API for US-listed stocks, ETFs, and ADRs.

Features

  • Three operating modes: full historical backfill, incremental daily updates, and on-demand single-symbol pulls
  • Resumable runs: checkpoint-based progress tracking survives interruptions
  • Multi-threaded fetching: configurable concurrency (1-50 workers)
  • Rate limit management: local counting with periodic server verification, adaptive throttling near quota limits
  • Corporate action adjustments: split and dividend adjustment factors computed and versioned with epoch tracking
  • Point-in-time eligibility filtering: filters symbols by historical market cap and average daily volume
  • Parquet storage: atomic writes with embedded metadata, deduplication on append

Requirements

Installation

1. Clone the repository

git clone <repo-url>
cd eod_data_pull

2. Create the conda environment

conda env create -f environment.yml
conda activate eodhd_data

Or install with pip:

pip install pandas requests pyarrow pyyaml click

3. Set up your API key

Create a file called .api_key in the project root containing your EODHD API token:

echo "your_api_key_here" > .api_key

Alternatively, set the key directly in your config file.

4. Create your config file

Copy the example config and customize it:

cp config.example.yaml config.yaml

5. (Optional) Set up pre-commit hooks

pip install pre-commit
pre-commit install

Configuration

The pipeline is configured via a YAML file. See config.example.yaml for the full reference:

api:
  key: ".api_key"                       # API key or path to file containing it
  max_workers: 20                       # Thread pool concurrency (1-50)

storage:
  data_dir: "./data"
  cache_dir: "./cache"

history:
  start_date: "2020-01-01"

filters:
  min_market_cap: 500000000             # $500M minimum
  min_daily_volume: 100000              # 100K shares/day minimum
  security_types:
    - "Common Stock"
    - "ETF"
    - "ADR"

logging:
  level: "INFO"
  log_dir: "./logs"
  json_format: true

mode: "backfill"                        # backfill | daily | symbol
symbols: []                             # Required when mode is "symbol"
dry_run: false

Modes

Mode Description
backfill Full historical pull from start_date to yesterday. Includes delisted symbols. Evaluates point-in-time eligibility (market cap + volume).
daily Incremental update from last stored date to yesterday. Active symbols only.
symbol Pull specific symbols listed in the symbols config field.

API key resolution

The api.key field accepts:

  • A literal API key string
  • An absolute file path containing the key
  • A relative file path (resolved relative to the config file's directory)

Usage

Run with a config file

python -m src --config config.yaml

Dry run (estimate API calls without fetching)

Set dry_run: true in your config, or for backfill mode it will log estimated API call counts.

Test mode

Quick test with 5 default symbols, 30 days of history, and 2 workers:

python -m src --config config.yaml --test

With custom symbols:

python -m src --config config.yaml --test --test-symbols NVDA --test-symbols AMD

CLI options

Usage: python -m src [OPTIONS]

Options:
  --config PATH        Path to YAML config file (required)
  --test               Test mode: 5 symbols, 30 days, 2 workers
  --test-symbols TEXT  Symbols for test mode (repeatable)
  --help               Show this message and exit

Data Output

Directory structure

data/
├── intraday/                   # 1-minute OHLCV, one Parquet file per CUSIP
│   └── {CUSIP}.parquet
├── adjustments/
│   ├── splits/{CUSIP}.parquet
│   └── dividends/{CUSIP}.parquet
└── metadata/
    ├── adjustment_epochs.parquet
    └── cusip_mapping.parquet

cache/
├── progress_{run_id}.json      # Resumable run progress
├── universe_{date}.parquet     # Daily cached symbol universe
└── eligibility/{CUSIP}.json    # Cached eligibility ranges

Intraday schema

Column Type Description
timestamp timestamp[ns, UTC] Bar timestamp
open_raw float64 Unadjusted open price
high_raw float64 Unadjusted high price
low_raw float64 Unadjusted low price
close_raw float64 Unadjusted close price
volume_raw int64 Unadjusted volume
adj_factor_price float64 Cumulative price adjustment factor
adj_factor_volume float64 Cumulative volume adjustment factor
adjustment_epoch int32 Version of adjustment factors

To get adjusted prices: adjusted_close = close_raw * adj_factor_price

Testing

Install test dependencies (included in environment.yml):

pip install pytest pytest-cov responses freezegun

Run the full test suite:

pytest

Run with coverage:

pytest --cov=src --cov-report=term-missing

Run specific test categories:

pytest tests/unit/               # Unit tests only
pytest tests/integration/        # Integration tests only
pytest -m thread_safety          # Thread safety tests

Project Structure

src/
├── cli.py              # Click CLI entry point
├── config.py           # YAML config loading and validation
├── logging_config.py   # Structured JSON logging
├── api/
│   ├── client.py       # HTTP client with retry/backoff
│   ├── endpoints.py    # EODHD API endpoint wrappers
│   └── rate_limiter.py # Thread-safe quota tracking
├── data/
│   ├── schemas.py      # PyArrow Parquet schemas
│   ├── parquet_store.py# Atomic read/write/append
│   └── adjustments.py  # Split/dividend factor computation
├── checkpoint/
│   └── progress.py     # Resumable run tracking
└── pipeline/
    ├── chunker.py      # Date range chunking (120-day windows)
    ├── common.py       # Shared fetch/store helpers
    ├── backfill.py     # Full historical backfill
    ├── daily.py        # Incremental daily updates
    ├── symbol.py       # Single-symbol pulls
    ├── universe.py     # Symbol universe construction
    └── eligibility.py  # Market cap/volume filtering

About

Command line utility to pull data from eodhd.com

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages