EODHD 1-Minute Intraday OHLCV Data Pipeline

A Python CLI for pulling 1-minute intraday OHLCV (Open, High, Low, Close, Volume) data from the EODHD API for US-listed stocks, ETFs, and ADRs.

Features

Three operating modes: full historical backfill, incremental daily updates, and on-demand single-symbol pulls
Resumable runs: checkpoint-based progress tracking survives interruptions
Multi-threaded fetching: configurable concurrency (1-50 workers)
Rate limit management: local counting with periodic server verification, adaptive throttling near quota limits
Corporate action adjustments: split and dividend adjustment factors computed and versioned with epoch tracking
Point-in-time eligibility filtering: filters symbols by historical market cap and average daily volume
Parquet storage: atomic writes with embedded metadata, deduplication on append

Requirements

Python 3.11
Conda (recommended) or pip
An EODHD API key

Installation

1. Clone the repository

git clone <repo-url>
cd eod_data_pull

2. Create the conda environment

conda env create -f environment.yml
conda activate eodhd_data

Or install with pip:

pip install pandas requests pyarrow pyyaml click

3. Set up your API key

Create a file called .api_key in the project root containing your EODHD API token:

echo "your_api_key_here" > .api_key

Alternatively, set the key directly in your config file.

4. Create your config file

Copy the example config and customize it:

cp config.example.yaml config.yaml

5. (Optional) Set up pre-commit hooks

pip install pre-commit
pre-commit install

Configuration

The pipeline is configured via a YAML file. See config.example.yaml for the full reference:

api:
  key: ".api_key"                       # API key or path to file containing it
  max_workers: 20                       # Thread pool concurrency (1-50)

storage:
  data_dir: "./data"
  cache_dir: "./cache"

history:
  start_date: "2020-01-01"

filters:
  min_market_cap: 500000000             # $500M minimum
  min_daily_volume: 100000              # 100K shares/day minimum
  security_types:
    - "Common Stock"
    - "ETF"
    - "ADR"

logging:
  level: "INFO"
  log_dir: "./logs"
  json_format: true

mode: "backfill"                        # backfill | daily | symbol
symbols: []                             # Required when mode is "symbol"
dry_run: false

Modes

Mode	Description
`backfill`	Full historical pull from `start_date` to yesterday. Includes delisted symbols. Evaluates point-in-time eligibility (market cap + volume).
`daily`	Incremental update from last stored date to yesterday. Active symbols only.
`symbol`	Pull specific symbols listed in the `symbols` config field.

API key resolution

The api.key field accepts:

A literal API key string
An absolute file path containing the key
A relative file path (resolved relative to the config file's directory)

Usage

Run with a config file

python -m src --config config.yaml

Dry run (estimate API calls without fetching)

Set dry_run: true in your config, or for backfill mode it will log estimated API call counts.

Test mode

Quick test with 5 default symbols, 30 days of history, and 2 workers:

python -m src --config config.yaml --test

With custom symbols:

python -m src --config config.yaml --test --test-symbols NVDA --test-symbols AMD

CLI options

Usage: python -m src [OPTIONS]

Options:
  --config PATH        Path to YAML config file (required)
  --test               Test mode: 5 symbols, 30 days, 2 workers
  --test-symbols TEXT  Symbols for test mode (repeatable)
  --help               Show this message and exit

Data Output

Directory structure

data/
├── intraday/                   # 1-minute OHLCV, one Parquet file per CUSIP
│   └── {CUSIP}.parquet
├── adjustments/
│   ├── splits/{CUSIP}.parquet
│   └── dividends/{CUSIP}.parquet
└── metadata/
    ├── adjustment_epochs.parquet
    └── cusip_mapping.parquet

cache/
├── progress_{run_id}.json      # Resumable run progress
├── universe_{date}.parquet     # Daily cached symbol universe
└── eligibility/{CUSIP}.json    # Cached eligibility ranges

Intraday schema

Column	Type	Description
`timestamp`	timestamp[ns, UTC]	Bar timestamp
`open_raw`	float64	Unadjusted open price
`high_raw`	float64	Unadjusted high price
`low_raw`	float64	Unadjusted low price
`close_raw`	float64	Unadjusted close price
`volume_raw`	int64	Unadjusted volume
`adj_factor_price`	float64	Cumulative price adjustment factor
`adj_factor_volume`	float64	Cumulative volume adjustment factor
`adjustment_epoch`	int32	Version of adjustment factors

To get adjusted prices: adjusted_close = close_raw * adj_factor_price

Testing

Install test dependencies (included in environment.yml):

pip install pytest pytest-cov responses freezegun

Run the full test suite:

pytest

Run with coverage:

pytest --cov=src --cov-report=term-missing

Run specific test categories:

pytest tests/unit/               # Unit tests only
pytest tests/integration/        # Integration tests only
pytest -m thread_safety          # Thread safety tests

Project Structure

src/
├── cli.py              # Click CLI entry point
├── config.py           # YAML config loading and validation
├── logging_config.py   # Structured JSON logging
├── api/
│   ├── client.py       # HTTP client with retry/backoff
│   ├── endpoints.py    # EODHD API endpoint wrappers
│   └── rate_limiter.py # Thread-safe quota tracking
├── data/
│   ├── schemas.py      # PyArrow Parquet schemas
│   ├── parquet_store.py# Atomic read/write/append
│   └── adjustments.py  # Split/dividend factor computation
├── checkpoint/
│   └── progress.py     # Resumable run tracking
└── pipeline/
    ├── chunker.py      # Date range chunking (120-day windows)
    ├── common.py       # Shared fetch/store helpers
    ├── backfill.py     # Full historical backfill
    ├── daily.py        # Incremental daily updates
    ├── symbol.py       # Single-symbol pulls
    ├── universe.py     # Symbol universe construction
    └── eligibility.py  # Market cap/volume filtering

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
config.example.yaml		config.example.yaml
config.yaml		config.yaml
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EODHD 1-Minute Intraday OHLCV Data Pipeline

Features

Requirements

Installation

1. Clone the repository

2. Create the conda environment

3. Set up your API key

4. Create your config file

5. (Optional) Set up pre-commit hooks

Configuration

Modes

API key resolution

Usage

Run with a config file

Dry run (estimate API calls without fetching)

Test mode

CLI options

Data Output

Directory structure

Intraday schema

Testing

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

EODHD 1-Minute Intraday OHLCV Data Pipeline

Features

Requirements

Installation

1. Clone the repository

2. Create the conda environment

3. Set up your API key

4. Create your config file

5. (Optional) Set up pre-commit hooks

Configuration

Modes

API key resolution

Usage

Run with a config file

Dry run (estimate API calls without fetching)

Test mode

CLI options

Data Output

Directory structure

Intraday schema

Testing

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages