Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

PyGlass is a high-performance C++ library with Python bindings for approximate similarity search using graph-based algorithms (HNSW, NSG, SSG, RNNDESCENT, IVF). The core is implemented in C++ with SIMD optimizations, and Python bindings are provided via pybind11.

## Essential Commands

### Build and Installation
```bash
# Development installation (editable)
pip install -v -e "python"

# Using uv (recommended for development)
uv venv
source .venv/bin/activate
uv pip install -v -e "python"

# Generate C++ compile_commands.json
bear -- uv pip install -v -e "python"
```

### Testing
```bash
pytest "python"
```

### Code Quality
```bash
# Python formatting
black .

# Python linting
ruff check .

# C++ formatting
clang-format -i $(find glass -name "*.hpp")
```

### Benchmarking
```bash
# Run with default configuration
python3 examples/main.py

# Run with custom configuration
python3 examples/main.py path/to/config.yaml
```

## Architecture

### Core Components

1. **Graph Algorithms** (`glass/hnsw/`, `glass/nsg/`): HNSW and NSG graph implementations for similarity search
2. **Quantization** (`glass/quant/`): Multiple quantization methods (FP16, SQ8, PQ, etc.) for memory efficiency
3. **SIMD Optimizations** (`glass/simd/`): Platform-specific optimizations for AVX2, AVX512, and ARM NEON
4. **Search Layer** (`glass/searcher/`): Unified search interface over different index types

### Python Integration

- **Bindings**: `python/bindings.cc` uses pybind11 to expose C++ classes
- **Builder Pattern**: Python API uses builder pattern for index construction
- **Dataset Utils**: `python/ann_dataset/` handles standard ANN benchmark datasets

### Key Design Patterns

1. **Template-Heavy C++**: Extensive use of C++ templates for performance and flexibility
2. **Memory Management**: Custom memory allocators and storage classes for efficient data handling
3. **Parallel Processing**: OpenMP for multi-threaded operations
4. **Builder Pattern**: Index construction uses fluent builder API

## Development Notes

- C++20 standard required
- No external C++ dependencies (self-contained)
- Performance is critical - always benchmark changes
- Use `examples/config.yaml` to configure benchmark parameters
- Dataset downloads may require `export HF_ENDPOINT=https://hf-mirror.com` for network issues
29 changes: 14 additions & 15 deletions examples/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,24 +6,23 @@ index_params:
index_args:
R: 48
L: 200
build_quant: SQ8U

rebuild: false
build_quant: SQ8U

search_quants:
- SQ4U
refine_quant: FP16
- search_quant: SQ4U
refine_quant: FP16

topks:
- 10
- 100

efs:
- 20
- 30
- 40
- 50
- 60
- 70
- 80
- 90
- 100
runs: 5
batch:
- true
- 200
- 300

runs: 3

concurrency:
min: 1
Loading