Add vendor-agnostic embeddings support with config file and smart authentication #3

Copilot · 2025-11-01T02:10:35Z

The embeddings implementation was hardcoded to OpenAI's endpoint and model. This adds configuration support for any OpenAI-compatible provider (Ollama, HuggingFace, Voyage AI, etc.) with production-ready config file support, smart authentication detection, and full backward compatibility.

Changes

Core Implementation

openai_provider.rs: Added from_config() method to use Settings configuration, and from_env() for backward compatibility
config.rs: Added EmbeddingsOpts struct with api_base and model fields integrated into Settings
service_container.rs: Updated to pass configuration from Settings to embedder
Smart Authentication: Auto-detects auth headers based on URL (Voyage AI uses X-Api-Key, localhost requires no auth, others use Authorization: Bearer)
Non-Compatible Providers: Added comprehensive documentation comments explaining which providers require different architecture (gRPC, OAuth, AWS SigV4, streaming-only APIs)
Maintains existing behavior when only OPENAI_API_KEY is set

Configuration (Production-Ready)

Config File Support (Primary method for production):

# ~/.config/bkmr/config.toml
[embeddings_opts]
api_base = "https://api.openai.com/v1"
model = "text-embedding-3-small"

Environment Variables (Override config file):

OPENAI_API_BASE - API endpoint (default: https://api.openai.com/v1)
OPENAI_API_URL - Legacy alias for backward compatibility
OPENAI_MODEL - Model name (default: text-embedding-3-small)

Configuration Priority:

Environment variables (highest - for testing/overrides)
Configuration file (production settings)
Defaults (fallback)

Authentication Detection

Voyage AI (api.voyageai.com): Automatically uses X-Api-Key header
Localhost (127.0.0.1/localhost): No authentication required
All others: Uses Authorization: Bearer header

Testing

embeddings_tests.rs: 15 comprehensive tests covering default, custom, and partial configurations
Added tests for Voyage AI configuration and localhost no-auth scenarios
Mutex-based isolation for environment variable tests
All existing tests updated for new variable names and Settings struct
Total: 18 embeddings-related tests passing

Documentation

default_config.toml: Added complete [embeddings_opts] section with examples for all providers
semantic-search.md: Updated to show config file as primary method with env vars as overrides
embeddings-providers.md: Comprehensive guide showing both config file and environment variable usage for all providers

Usage

Configuration File (Recommended for Production)

Add to ~/.config/bkmr/config.toml:

[embeddings_opts]
api_base = "http://localhost:11434/v1"  # Ollama
model = "nomic-embed-text"

Generate default config:

bkmr --generate-config > ~/.config/bkmr/config.toml

Environment Variables (Testing/Overrides)

# OpenAI (unchanged behavior)
export OPENAI_API_KEY="sk-your-key"

# Ollama (local, private - no auth needed)
export OPENAI_API_BASE="http://localhost:11434/v1"
export OPENAI_MODEL="nomic-embed-text"

# HuggingFace
export OPENAI_API_BASE="https://api-inference.huggingface.co/v1"
export OPENAI_MODEL="sentence-transformers/all-MiniLM-L6-v2"
export OPENAI_API_KEY="hf_token"

# Voyage AI (uses X-Api-Key automatically)
export OPENAI_API_BASE="https://api.voyageai.com/v1"
export OPENAI_MODEL="voyage-2"
export OPENAI_API_KEY="pa-your-key"

All existing workflows continue unchanged. The OPENAI_* prefix is kept for backward compatibility despite now supporting multiple providers.

Original prompt

On vendor-agnostic-embeddings.md, # Feature: Vendor Agnostic Embeddings

The bkmr bookmark manager currently has a hardcoded OpenAI embeddings implementation. This enhancement abstracts the embeddings layer to support any OpenAI-compatible embeddings provider (Ollama, HuggingFace, Voyage AI, OpenAI-compatible endpoints) while maintaining the existing OPENAI_* environment variable naming for backwards compatibility. The goal is flexibility across vendors without implementing a complex abstraction layer for non-compatible providers.

Requirements

Please review this PRD document at https://github.com/danielbodnar/bkmr/blob/1cf090d4431815a45534f1678f8e08c2431465a3/.github/copilor/prds/vendor-agnostic-embeddings.md

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

danielbodnar · 2025-11-01T02:11:43Z

@copilot

Vendor-Agnostic OpenAI-Compatible Embeddings System for BKMR

Version: 1.0
Date: 2025-10-31

Context

The bkmr bookmark manager currently has a hardcoded OpenAI embeddings implementation. This enhancement abstracts the embeddings layer to support any OpenAI-compatible embeddings provider (Ollama, HuggingFace, Voyage AI, OpenAI-compatible endpoints) while maintaining the existing OPENAI_* environment variable naming for backwards compatibility. The goal is flexibility across vendors without implementing a complex abstraction layer for non-compatible providers.

Requirements

Must Have

Abstract embeddings implementation to be vendor-agnostic for OpenAI-compatible APIs
Search crates.io for existing multi-vendor embeddings library
If no suitable library exists, implement HTTP REST pattern with providers.rs for common endpoints
Maintain existing OPENAI_* environment variable naming (backwards compatibility)
Support URL and model configuration (OPENAI_API_KEY, OPENAI_API_BASE, OPENAI_MODEL)
Support common providers: Ollama, HuggingFace, Voyage AI, OpenAI, and other OpenAI-compatible services
Add documentation comments (/// or //!) indicating where non-compatible providers would require different architecture
Zero breaking changes for existing users

Out of Scope

Non-OpenAI-compatible embeddings providers (requires different API patterns)
OAuth authentication flows
AWS SigV4 authentication
gRPC-based APIs
Streaming embeddings
Batch optimization beyond OpenAI API

Configuration & Decisions

Environment Variables:

OPENAI_API_KEY - Required
OPENAI_API_BASE - Optional, defaults to https://api.openai.com/v1
OPENAI_MODEL - Optional, defaults to text-embedding-3-small

Auth Header Detection:
Auto-detect based on URL: Voyage AI uses X-Api-Key, localhost/Ollama has no auth, everything else uses Authorization: Bearer.

Validation: Lazy (on first embedding request). OPENAI_API_BASE must be valid URL format if provided.

Error Handling: Clear messages pointing to env vars, no API key exposure in logs, provider endpoint shown in network errors.

Provider Support Matrix

OpenAI-Compatible Cloud Providers:

Provider	Base URL	Default Model	Dimensions	Auth Header	Auth Prefix	Context	Cost
OpenAI	`https://api.openai.com/v1`	`text-embedding-3-small`	1536	`Authorization`	`Bearer`	8K	$0.02/1M
Voyage AI	`https://api.voyageai.com/v1`	`voyage-3-large`	1024	`X-Api-Key`	``	32K	$0.06/1M
Ollama	`http://localhost:11434/v1`	`nomic-embed-text`	768	``	``	8K	Free
HuggingFace	Model-specific URL	Model-specific	Varies	`Authorization`	`Bearer`	Varies	Free tier
Cohere	`https://api.cohere.com/v1`	`embed-v4.0`	1024	`Authorization`	`Bearer`	512	$0.12/1M
Jina AI	`https://api.jina.ai/v1`	`jina-embeddings-v3`	1024	`Authorization`	`Bearer`	8K	Free tier
Mistral AI	`https://api.mistral.ai/v1`	`mistral-embed`	Varies	`Authorization`	`Bearer`	Varies	Competitive
Google Gemini	`https://generativelanguage.googleapis.com/v1beta/openai`	`text-embedding-004`	768	`Authorization`	`Bearer`	2K-8K	Free tier

Note: Only providers that use OpenAI-compatible /v1/embeddings endpoint with standard request/response format are in scope. Providers like AWS Bedrock, Google Vertex AI, and Snowflake Cortex use different APIs and are out of scope.

Offline Embedding Engines (Rust)

For users requiring complete privacy and no API costs, several high-quality Rust libraries support local embedding generation:

Production-Ready Libraries:

embed_anything - Minimalist, high-performance pipeline supporting ONNX, Candle, dense/sparse embeddings. Multi-modal (text, images, audio, PDFs). GPU acceleration via CUDA. Python bindings available. Actively maintained by StarlightSearch.
candle_embed - Simple, CUDA/CPU powered embeddings using HuggingFace's Candle framework. Supports any HF model, configurable pooling/normalization. Minimal dependencies, MIT licensed.
candle (HuggingFace) - Full ML framework with BERT, T5, JinaBERT embedding models. WASM support for browser deployment. Optimized CPU (MKL/Accelerate) and CUDA backends. Production-ready, serverless-focused.

Key Features:

No PyTorch Dependency: Low memory footprint, easy cloud deployment
True Multithreading: Rust's concurrency for parallel processing
GPU Acceleration: CUDA support out-of-the-box via Candle
Model Flexibility: Any HuggingFace model, ONNX runtime support
Quantization: Reduced precision (int8, binary) for storage savings

Popular Models for Local Deployment:

sentence-transformers/all-MiniLM-L6-v2 (384-dim, lightweight)
BAAI/bge-large-en-v1.5 (1024-dim, high quality)
nomic-ai/nomic-embed-text-v1 (768-dim, long context)
intfloat/e5-large-v2 (1024-dim, Microsoft research)
jinaai/jina-embeddings-v2-base-en (768-dim, optimized)

Implementation Note: While offline models solve API cost/privacy concerns, they require different architecture than OpenAI-compatible APIs. Recommend implementing API-based providers first, then consider offline models as separate feature if needed.

Backwards Compatibility: Existing users with only OPENAI_API_KEY set continue working unchanged. No migration needed.

Out of Scope: Non-OpenAI-compatible providers (Vertex AI, Bedrock, Anthropic), OAuth/SigV4 auth, gRPC, streaming, provider-specific optimizations.

Architecture

Library Selection Criteria

Search crates.io for libraries matching these criteria:

Must Have:

OpenAI-compatible endpoint support
Configurable base URLs
Active maintenance (commit in last 6 months)
Reasonable dependency footprint (<20 direct dependencies)

Nice to Have:

Multiple provider support built-in
Good error handling
Async/await support
Production usage examples

Evaluation Process:

Search terms: openai, embeddings, llm client, inference
Evaluate top 3-5 candidates
Check: stars, recent commits, open issues, API design
Decision: Use library if it saves >200 lines of code and meets criteria

Implementation Path A: Using Existing Library

// Wrap external library in our trait
use external_lib::EmbeddingsClient;

pub struct VendorAgnosticEmbedding {
    client: EmbeddingsClient,
}

impl Embedder for VendorAgnosticEmbedding {
    async fn embed(&self, text: &str) -> Result<Vec<f32>> {
        // NOTE: This assumes OpenAI-compatible API format.
        // Non-compatible providers (Vertex, Bedrock) would need
        // different client initialization and potentially different
        // trait methods to accommodate their request/response formats.
        self.client.embed(text).await.map_err(Into::into)
    }
}

Implementation Path B: Custom HTTP Client

File Structure:

infrastructure/embeddings/
├── mod.rs
├── providers.rs              # NEW: Provider constants and metadata
├── openai_compatible.rs      # NEW: Generic client (renamed from openai_provider.rs)
├── config.rs                 # NEW: Configuration loading
└── error.rs                  # Existing error types

providers.rs:

//! OpenAI-compatible embeddings provider configurations
//!
//! This module defines common provider endpoints that follow the OpenAI
//! embeddings API format. All providers here use:
//! - POST /embeddings endpoint
//! - Request: { "model": "...", "input": ["..."] }
//! - Response: { "data": [{ "embedding": [...] }] }
//!
//! # Limitations
//! Non-compatible providers (Google Vertex AI, AWS Bedrock, Anthropic)
//! cannot be added here as they use different API formats, authentication
//! mechanisms, and request/response structures. Those would require:
//! - Separate client implementations
//! - Different trait methods or trait redesign
//! - Provider-specific authentication (OAuth, SigV4, etc.)

pub struct ProviderConfig {
    pub base_url: &'static str,
    pub default_model: &'static str,
    pub auth_header: &'static str,
    pub auth_prefix: &'static str,
}

pub const OPENAI: ProviderConfig = ProviderConfig {
    base_url: "https://api.openai.com/v1",
    default_model: "text-embedding-3-small",
    auth_header: "Authorization",
    auth_prefix: "Bearer ",
};

pub const VOYAGE: ProviderConfig = ProviderConfig {
    base_url: "https://api.voyageai.com/v1",
    default_model: "voyage-3-large",
    auth_header: "X-Api-Key",
    auth_prefix: "",
};

pub const OLLAMA: ProviderConfig = ProviderConfig {
    base_url: "http://localhost:11434/v1",
    default_model: "nomic-embed-text",
    auth_header: "",
    auth_prefix: "",
};

// Note: HuggingFace Inference API has model-specific URLs
// Users should set OPENAI_API_BASE to full model URL

openai_compatible.rs:

//! Generic OpenAI-compatible embeddings client
//!
//! This implementation works with any API that follows OpenAI's
//! embeddings endpoint specification. It supports:
//! - Configurable base URLs
//! - Configurable authentication headers
//! - Custom model names
//!
//! # Compatibility Assumptions
//! This assumes all providers:
//! 1. Use JSON POST to /embeddings
//! 2. Accept { "model": "...", "input": [...] }
//! 3. Return { "data": [{ "embedding": [...] }] }
//! 4. Use simple API key authentication in headers
//!
//! # Future Considerations
//! Non-compatible providers would need:
//! - Different request/response serialization
//! - Different authentication (OAuth flows, SigV4 signing)
//! - Different error handling for provider-specific formats
//! - Potentially different HTTP libraries or middleware

use reqwest::Client;
use serde::{Deserialize, Serialize};

#[derive(Serialize)]
struct EmbeddingRequest {
    model: String,
    input: Vec<String>,
}

#[derive(Deserialize)]
struct EmbeddingResponse {
    data: Vec<EmbeddingData>,
}

#[derive(Deserialize)]
struct EmbeddingData {
    embedding: Vec<f32>,
}

pub struct OpenAiCompatibleClient {
    client: Client,
    base_url: String,
    model: String,
    api_key: String,
    auth_header: String,
    auth_prefix: String,
}

impl OpenAiCompatibleClient {
    pub fn from_env() -> Result<Self> {
        let api_key = std::env::var("OPENAI_API_KEY")
            .map_err(|_| Error::ConfigError("OPENAI_API_KEY not set"))?;
        
        let base_url = std::env::var("OPENAI_API_BASE")
            .unwrap_or_else(|_| "https://api.openai.com/v1".to_string());
        
        let model = std::env::var("OPENAI_MODEL")
            .unwrap_or_else(|_| "text-embedding-3-small".to_string());
        
        // Detect provider from base_url and set auth headers accordingly
        let (auth_header, auth_prefix) = if base_url.contains("voyageai.com") {
            ("X-Api-Key".to_string(), "".to_string())
        } else if base_url.contains("localhost") || base_url.contains("127.0.0.1") {
            ("".to_string(), "".to_string())  // No auth for local Ollama
        } else {
            ("Authorization".to_string(), "Bearer ".to_string())
        };
        
        Ok(Self {
            client: Client::new(),
            base_url,
            model,
            api_key,
            auth_header,
            auth_prefix,
        })
    }
    
    pub async fn embed(&self, text: &str) -> Result<Vec<f32>> {
        let url = format!("{}/embeddings", self.base_url);
        
        let request = EmbeddingRequest {
            model: self.model.clone(),
            input: vec![text.to_string()],
        };
        
        let mut req = self.client.post(&url).json(&request);
        
        // Add authentication header if configured
        if !self.auth_header.is_empty() {
            let auth_value = format!("{}{}", self.auth_prefix, self.api_key);
            req = req.header(&self.auth_header, auth_value);
        }
        
        // NOTE: Error handling here assumes OpenAI-style error responses.
        // Different providers may return errors in different formats.
        let response = req.send().await?;
        
        if !response.status().is_success() {
            let status = response.status();
            let body = response.text().await?;
            return Err(Error::ApiError(format!(
                "Provider returned {}: {}",
                status, body
            )));
        }
        
        let embedding_response: EmbeddingResponse = response.json().await?;
        
        embedding_response
            .data
            .into_iter()
            .next()
            .map(|data| data.embedding)
            .ok_or_else(|| Error::ApiError("No embedding in response".to_string()))
    }
}

Embeddings Ecosystem Research

Available Rust Crates

Multi-Provider Libraries:

ai (crates.io/crates/ai) - Actively maintained, supports OpenAI-compatible providers including Ollama, Gemini, Azure OpenAI. Has Client::from_url() for custom endpoints and EmbeddingsRequestBuilder. Best candidate for library approach.
openai_dive (v1.2) - Async OpenAI library with support for OpenAI-compatible APIs via custom base URLs. Good documentation, active maintenance.
openagent - Comprehensive OpenAI client with embeddings support, but focused on OpenAI specifically.
async-openai - Fully async, low-level and high-level APIs, good for OpenAI but less multi-provider focused.

OpenAI-Only Libraries:

openai - Basic unofficial Rust library
openai_api_rust - Alternative community-maintained
openai-rust - Matches official docs closely

Specialized:

embed_anything - High-performance with ONNX/local models, not API-focused
ai-gateway - Proxy/gateway rather than client library

Recommendation: Evaluate ai crate first - it explicitly supports configurable base URLs and multiple OpenAI-compatible providers. If it doesn't fit, build custom implementation.

Cloud Embedding Providers & Models

OpenAI

API: https://api.openai.com/v1/embeddings
Models: text-embedding-3-small (1536-dim, $0.02/1M tokens), text-embedding-3-large (3072-dim, $0.13/1M tokens), text-embedding-ada-002 (1536-dim, legacy)
Auth: Authorization: Bearer {key}
Context: 8K tokens

Voyage AI

API: https://api.voyageai.com/v1/embeddings
Models:
- voyage-3-large: 1024-dim default (supports 256, 512, 2048), 32K context, $0.06/1M tokens
- voyage-3.5: 1024-dim default (2048, 512, 256), 32K context, best quality
- voyage-3.5-lite: 512-dim default, 32K context, most cost-effective
- voyage-3: 1024-dim, 32K context
- voyage-code-3: 1536-dim, optimized for code
- voyage-law-2, voyage-finance-2: 1024-dim, domain-specific
Auth: X-Api-Key: {key} (non-standard header)
Features: Matryoshka embeddings, quantization (int8, binary)

Ollama (Local)

API: http://localhost:11434/v1/embeddings (OpenAI-compatible endpoint)
Models:
- nomic-embed-text: 768-dim, 8192 context, high-performance, surpasses OpenAI ada-002
- mxbai-embed-large: 1024-dim, 512 context, state-of-the-art
- all-minilm: 384-dim, 256 context, lightweight and fast
- snowflake-arctic-embed: 1024-dim, multilingual
Auth: None (local server)
Features: Completely free, offline, privacy-focused

Cohere

API: https://api.cohere.com/v1/embeddings
Models: embed-v4.0 (multimodal), embed-multilingual-v3.0 (100+ languages, 1024-dim), embed-english-v3.0, light versions (384-dim)
Auth: Authorization: Bearer {key}
Context: 512 tokens
Cost: $0.12/1M tokens

Jina AI

API: https://api.jina.ai/v1/embeddings
Models: jina-embeddings-v3 (1024-dim, 89+ languages), jina-embeddings-v4 (2048-dim multimodal), task-specific adapters
Auth: Authorization: Bearer {key}
Context: 8192 tokens
Cost: 10M free tokens, then paid
Features: Open source models available

Mistral AI

API: https://api.mistral.ai/v1/embeddings
Models: mistral-embed (general purpose), specialized models
Auth: Authorization: Bearer {key}
Features: European-developed, efficient processing, competitive pricing

Google Gemini

API: https://generativelanguage.googleapis.com/v1beta/openai/embeddings
Models: gemini-embedding-001 (3072-dim adjustable to 768), text-embedding-004, text-multilingual-embedding-002
Auth: Authorization: Bearer {key} (requires http1_title_case_headers for compatibility)
Context: 2048-8192 tokens
Cost: Free tier with generous limits

HuggingFace Inference API

API: https://api-inference.huggingface.co/pipeline/feature-extraction or model-specific URLs
Models: Thousands available (sentence-transformers, e5-large, etc.)
Auth: Authorization: Bearer {key}
Note: Varies by model, some use different endpoints

Offline Embedding Engines (Rust-Native)

Production-Grade Frameworks:

embed_anything (StarlightSearch)

Architecture: Rust-native with Python bindings, supports Candle and ONNX backends
Features: Dense, sparse, late-interaction, ColBERT embeddings. Multi-modal (text, images, audio, PDFs). Memory-efficient streaming to vector databases.
Performance: No PyTorch dependency, true multithreading, CUDA acceleration, minimal memory footprint
Models: Any HuggingFace model via Candle, ONNX models (ModernBERT, ColPali, Jina)
Use Case: Production embedding pipeline with document ingestion

candle_embed

Architecture: Simple wrapper around HuggingFace Candle framework
Features: CUDA/CPU support, configurable pooling/normalization, preset popular models
Models: Any HuggingFace model, includes presets for UAE, GIST, sentence-transformers
Use Case: Lightweight embedding generation with minimal setup

HuggingFace candle

Architecture: Full ML framework, PyTorch-like API in Rust
Features: BERT, T5, JinaBERT models built-in. WASM support for browser. Optimized CPU (MKL/Accelerate) and CUDA backends.
Models: Pre-implemented: BERT, T5, Jina, DINOv2 (vision)
Use Case: Serverless inference, lightweight deployments, edge computing

Popular Open-Source Models:

sentence-transformers/all-MiniLM-L6-v2 - 384-dim, 22M params, fast
BAAI/bge-large-en-v1.5 - 1024-dim, SOTA open-source
BAAI/bge-m3 - 1024-dim, multilingual with Chinese excellence
nomic-ai/nomic-embed-text-v1 - 768-dim, 137M params, 8K context
intfloat/e5-large-v2 - 1024-dim, Microsoft research quality
intfloat/e5-mistral-7b-instruct - 4096-dim, large enterprise model
jinaai/jina-embeddings-v2-base-en - 768-dim, 137M params
jinaai/jina-colbert-v2 - Late-interaction model for reranking

Quantized Models (ONNX):

ModernBERT variants with Q4F16 quantization
Reduced storage, faster inference, minimal quality loss

Implementation Decision Matrix

Use ai crate if:

✅ Supports configurable base URLs (confirmed: Client::from_url())
✅ Handles authentication headers properly
✅ Active maintenance (check recent commits)
✅ Reasonable dependencies (<30 direct deps)
✅ Saves >200 lines of code vs custom implementation

Build custom if:

❌ Library doesn't support auth header customization (Voyage needs X-Api-Key)
❌ No way to configure per-provider settings
❌ Adds unnecessary dependencies or complexity
❌ Unclear maintenance status

Add offline support (separate phase) if:

✅ Users need complete privacy (no cloud APIs)
✅ Cost sensitivity for high-volume usage
✅ Air-gapped environments required
✅ GPU resources available for local inference

Implementation Approach

Step 0: Run Test Suite

A comprehensive test suite is provided in embeddings_tests.rs. Run with:

cargo test --test embeddings_tests

Initially most tests will fail (types don't exist). Implement code to make tests pass. See TESTS_README.md for details.

Step 1: Evaluate `ai` Crate

Test ai crate with all target providers. If it works well, use it. If limitations found, proceed to custom implementation.

Step 2: Custom Implementation (if needed)

Create modular architecture: providers.rs (constants), config.rs (env loading), openai_compatible.rs (generic client). Provider-specific auth header detection. Comprehensive testing with mocks.

Step 3: Offline Models (future phase)

If demand exists, add candle_embed or embed_anything integration as alternative to API-based providers. Requires separate trait implementation due to different architecture.

Quality Criteria

Functional Requirements

✅ Works with OpenAI official API
✅ Works with Ollama (OpenAI-compatible mode)
✅ Works with Voyage AI
✅ Works with HuggingFace Inference API
✅ Supports custom OpenAI-compatible endpoints
✅ Backwards compatible with existing OPENAI_API_KEY configs
✅ Clear error messages for configuration issues
✅ No API key exposure in logs or error messages

Code Quality

Rust Comment Conventions:

Documentation comments (/// or //!) for public APIs explaining limitations
Inline comments (// NOTE:) for specific implementation constraints
Avoid made-up prefixes like "FUTURE:" - not idiomatic Rust
Avoid todo!() macro for documentation - it's for unimplemented code paths

Code Standards:

Documentation comments (///, //!) explain architectural limitations
No unnecessary abstraction (YAGNI for non-compatible providers)
Follows Rust idioms and existing project patterns
Comprehensive error handling with descriptive messages
Test coverage >80% for new code
No clippy warnings
Formatted with rustfmt

Configuration

Environment variables validated on first use
Clear error messages for missing/invalid config
Sensible defaults (OpenAI endpoint, reasonable model)
Configuration errors caught early with helpful messages

Documentation

Each provider has working example
Environment variable reference table
Common troubleshooting section
Explicit note about OpenAI-compatible requirement
Documentation comments (///, //!) explain architectural constraints

Examples

Example 1: OpenAI (Default, Backwards Compatible)

export OPENAI_API_KEY="sk-proj-..."
# Uses defaults: https://api.openai.com/v1 and text-embedding-3-small
bkmr search "rust async patterns"

# Output:
# Using OpenAI embeddings (text-embedding-3-small)
# Found 5 bookmarks matching your query...

Example 2: Voyage AI

export OPENAI_API_KEY="pa-..."
export OPENAI_API_BASE="https://api.voyageai.com/v1"
export OPENAI_MODEL="voyage-3-large"
bkmr search "rust async patterns"

# Output:
# Using Voyage AI embeddings (voyage-3-large)
# Found 5 bookmarks matching your query...

Example 3: Ollama (Local)

# Start Ollama server first: ollama serve
# Pull embedding model: ollama pull nomic-embed-text

export OPENAI_API_BASE="http://localhost:11434/v1"
export OPENAI_MODEL="nomic-embed-text"
# No API key needed for local Ollama
bkmr search "rust async patterns"

# Output:
# Using Ollama embeddings (nomic-embed-text)
# Found 5 bookmarks matching your query...

Example 4: HuggingFace Inference API

export OPENAI_API_KEY="hf_..."
export OPENAI_API_BASE="https://api-inference.huggingface.co/models/sentence-transformers/all-MiniLM-L6-v2"
export OPENAI_MODEL="sentence-transformers/all-MiniLM-L6-v2"
bkmr search "rust async patterns"

# Output:
# Using HuggingFace embeddings (sentence-transformers/all-MiniLM-L6-v2)
# Found 5 bookmarks matching your query...

Example 5: Custom OpenAI-Compatible Endpoint

export OPENAI_API_KEY="custom-key-123"
export OPENAI_API_BASE="https://embeddings.mycompany.internal/v1"
export OPENAI_MODEL="company-embeddings-v2"
bkmr search "rust async patterns"

# Output:
# Using custom embeddings (company-embeddings-v2)
# Found 5 bookmarks matching your query...

Example 6: Error Handling

# Missing API key
$ export OPENAI_API_BASE="https://api.voyageai.com/v1"
$ bkmr search "test"
Error: Configuration error: OPENAI_API_KEY environment variable not set
Hint: Set your API key with: export OPENAI_API_KEY="your-key-here"

# Invalid URL format
$ export OPENAI_API_KEY="sk-..."
$ export OPENAI_API_BASE="not-a-url"
$ bkmr search "test"
Error: Configuration error: OPENAI_API_BASE is not a valid URL
Provided: not-a-url
Hint: Use format like: https://api.openai.com/v1

# Authentication failure
$ export OPENAI_API_KEY="invalid-key"
$ bkmr search "test"
Error: Authentication failed (401 Unauthorized)
Hint: Verify your OPENAI_API_KEY is correct for the provider at:
      https://api.openai.com/v1

# Model not available
$ export OPENAI_MODEL="nonexistent-model"
$ bkmr search "test"
Error: Provider returned 404: Model 'nonexistent-model' not found
Hint: Check provider documentation for available models

Common Pitfalls

Implementation Pitfalls

❌ Don't:

Create complex abstractions for non-compatible providers yet (YAGNI)
Hardcode provider URLs in multiple places - use providers.rs constants
Assume all providers use "Authorization: Bearer" - Voyage uses "X-Api-Key"
Silently fail on configuration errors - validate early with clear messages
Log or expose API keys in error messages
Make HTTP requests at startup for validation

✅ Do:

Add doc comments (///, //!) explaining architectural limitations
Keep the abstraction simple and focused on OpenAI-compatible APIs
Use provider detection for auth header selection
Fail fast with descriptive errors on first embedding request
Mask API keys in any user-visible output
Support both HTTP and HTTPS for local development

Configuration Pitfalls

❌ Don't:

Break existing users who only have OPENAI_API_KEY set
Require users to set all three variables - use sensible defaults
Validate API key format (provider-specific, often opaque strings)
Make assumptions about model names across providers

✅ Do:

Validate OPENAI_API_BASE is a valid URL format if provided
Provide clear error messages when provider endpoints return unexpected responses
Use environment variables consistently with OpenAI convention
Document provider-specific quirks (like Voyage's X-Api-Key header)

Testing Pitfalls

❌ Don't:

Require actual API keys for CI/CD
Hardcode test data that assumes only OpenAI response format
Skip error path testing
Test only happy path scenarios

✅ Do:

Mock multiple provider response formats
Test configuration loading thoroughly
Test error paths (invalid URLs, auth failures, network errors)
Test backwards compatibility explicitly
Use feature flags for integration tests that need real API keys

Documentation Pitfalls

❌ Don't:

Claim support for non-OpenAI-compatible providers
Assume users know about OpenAI-compatible APIs
Copy/paste examples without testing them
Forget to document provider-specific requirements

✅ Do:

Explain what "OpenAI-compatible" means clearly
Provide working examples for each major provider
Document known limitations explicitly
Include troubleshooting section with common errors
Link to provider-specific documentation

Security Pitfalls

❌ Don't:

Log API keys or tokens in any form
Commit example API keys to version control (even fake ones)
Display full API keys in error messages
Store API keys in configuration files

✅ Do:

Document environment variable approach
Consider adding API key masking in error messages (show first/last 4 chars)
Warn about using HTTP (non-HTTPS) endpoints
Recommend using secret management tools for production
Clear API keys from memory when possible

Scope Boundaries

In Scope for This PR

OpenAI and OpenAI-compatible APIs
Providers with /embeddings endpoint matching OpenAI format
Configurable base URLs and models
Authentication via API keys in headers
HTTP/HTTPS endpoints
Environment variable configuration
Backwards compatibility for existing users
Documentation for major providers

Out of Scope (Future Work)

Non-Compatible Providers:

Google Vertex AI (OAuth, different request format)
AWS Bedrock (SigV4 signing, different API)
Anthropic embeddings (if they add it - different format)
Azure OpenAI (different auth, might work but untested)

Advanced Features:

OAuth authentication flows
AWS SigV4 authentication
gRPC-based APIs
Binary response formats
Streaming embeddings
Batch optimization beyond what OpenAI API provides
Provider-specific features (e.g., Voyage's domain specialization)
Configuration files (environment variables only for now)
Web UI for provider selection

Where to Add Documentation Comments:

Use idiomatic Rust doc comments to explain architectural constraints:

Trait definition - Use /// doc comment explaining OpenAI-compatible assumptions

/// Embedder trait for generating text embeddings.
///
/// # Compatibility
/// Current implementations assume OpenAI-compatible API format.
pub trait Embedder { }

Request/response structs - Document expected JSON structure with ///

/// Request format for OpenAI-compatible embeddings endpoints.
/// Expects: { "model": "...", "input": [...] }
#[derive(Serialize)]
struct EmbeddingRequest { }

Authentication - Use // NOTE: for inline constraints

// NOTE: Assumes simple API key in headers (Authorization or X-Api-Key).
// OAuth/SigV4 providers need different authentication.

Error handling - Use // NOTE: for format assumptions

// NOTE: Assumes OpenAI error response format.
// Non-compatible providers may use different error structures.

Module (providers.rs) - Use //! module-level doc for scope limitations

//! # Limitations
//! Non-compatible providers (Vertex AI, Bedrock) require different architecture.

Implementation Checklist

References

OpenAI Embeddings API: https://platform.openai.com/docs/guides/embeddings
Voyage AI API: https://docs.voyageai.com/docs/embeddings
Ollama OpenAI Compatibility: https://docs.ollama.com/openai
Original PR: Add vendor-agnostic OpenAI-compatible embeddings support #1
Abstraction Request: Add vendor-agnostic OpenAI-compatible embeddings support #1 (comment)

Deliverables

Test Suite:

embeddings_tests.rs - 25 comprehensive unit tests covering configuration, provider detection, serialization, and edge cases
TESTS_README.md - Documentation for running and extending tests

Expected Artifacts:

Provider abstraction implementation (library or custom)
Configuration loading from environment variables
Request/response serialization
Provider-specific auth header detection
Error handling
Documentation with provider examples Add vendor-agnostic OpenAI-compatible embeddings support #1 (comment)

Implementation Checklist

Use this checklist during implementation:

Phase 1: Research

Search crates.io for multi-vendor embeddings libraries
Evaluate top 3 candidates against criteria
Make build vs buy decision
Document decision rationale

Phase 2: Implementation

Create/update file structure
Implement provider abstraction
Add configuration loading
Implement auth header detection
Add error handling
Add doc comments (///) at architecture extension points

Phase 3: Testing

Phase 4: Documentation

Phase 5: Review

danielbodnar · 2025-11-01T02:16:54Z

@copilot please review these as well:

bkmr-embeddings-abstraction-prd.md
TESTS_README.md

Co-authored-by: danielbodnar <[email protected]>

…ate defaults - Changed OPENAI_API_URL to OPENAI_API_BASE (with backward compatibility) - Updated default model from text-embedding-ada-002 to text-embedding-3-small - Implemented smart auth header detection (Voyage AI uses X-Api-Key, localhost no auth, others Bearer) - Added comprehensive documentation about non-compatible providers - Updated all tests (15 passing) and documentation - Added test for Voyage AI configuration - Added test for localhost no-auth requirement Co-authored-by: danielbodnar <[email protected]>

Copilot · 2025-11-01T02:48:38Z

@copilot

Vendor-Agnostic OpenAI-Compatible Embeddings System for BKMR

...

I've updated the implementation to align with the PRD requirements:

Changes made (commit 29fd571):

Environment Variables: Changed OPENAI_API_URL to OPENAI_API_BASE (with backward compatibility for the legacy name). Updated default model to text-embedding-3-small.
Smart Auth Detection: Implemented URL-based authentication detection:
- Voyage AI (api.voyageai.com) → X-Api-Key header
- Localhost/127.0.0.1 → No authentication required
- All others → Authorization: Bearer header
Non-Compatible Providers: Added comprehensive documentation comments explaining which providers would require different architecture (gRPC, OAuth, AWS SigV4, streaming-only APIs, etc.).
Testing: Updated all 15 tests + 2 existing tests. Added new tests for:
- Voyage AI configuration with X-Api-Key auth
- Localhost no-auth requirement
- Legacy OPENAI_API_URL backward compatibility
Documentation: Updated semantic-search.md and embeddings-providers.md with new variable names, auth detection explanation, and OpenAI's latest models.

All 17 embeddings-related tests passing. The implementation now follows the OpenAI-compatible REST pattern with smart auth detection as specified in the PRD.

danielbodnar · 2025-11-01T02:53:10Z

@copilot there is an error in the prd files and this PR's description / instructions:

## Out of Scope (Future Work)
### Advanced Features:
- Configuration files (environment variables only for now)

That is incorrect! This "vendor agnostic embeddings" feature MUST be both fully backwards compatible, AND production-ready, which means configuration files are absolutely in scope and must be supported and implemented in the same way as the original openai-only code was.

Also, please make sure you clean up any redundant files - I don't want to leave ~30 different copies of files that mostly all do the same thing.

(Pretty sure "don't be dumb" is one of the 17 Rules of The Unix Philosophy.)

- Added EmbeddingsOpts to Settings struct for config file support - Updated apply_env_overrides to handle OPENAI_API_BASE and OPENAI_MODEL - Modified OpenAiEmbedding::from_config to use Settings instead of just env vars - Updated service container to pass config to embedder - Added embeddings_opts section to default_config.toml with examples - Updated documentation to show config file usage (production-ready) - Environment variables still override config file for flexibility - Fixed all test Settings struct initializations - All 18 embeddings tests passing Co-authored-by: danielbodnar <[email protected]>

Copilot · 2025-11-01T03:04:42Z

@copilot there is an error in the prd files and this PR's description / instructions:

...

Configuration file support has been added (commit 0e9a402).

Changes:

Settings struct: Added EmbeddingsOpts with api_base and model fields
Config file support: Users can now configure embeddings in ~/.config/bkmr/config.toml
Priority order: Environment variables → Config file → Defaults
default_config.toml: Added complete [embeddings_opts] section with examples for all providers

Example config:

[embeddings_opts]
api_base = "https://api.openai.com/v1"
model = "text-embedding-3-small"

Environment variables (OPENAI_API_BASE, OPENAI_MODEL) still work and override config file settings for flexibility.

No redundant files - verified the repository contains only the necessary files. All 18 embeddings-related tests passing.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

google.com
- Triggering command: /home/REDACTED/work/bkmr/bkmr/bkmr/target/debug/deps/bkmr-b5ac4f29378a1486 (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Copilot

Pull Request Overview

This PR adds comprehensive support for multiple OpenAI-compatible embeddings providers (Ollama, HuggingFace, Voyage AI, and custom endpoints) while maintaining backward compatibility with the existing OpenAI-only implementation. The configuration can be managed through both config files and environment variables, with environment variables taking precedence.

Added new EmbeddingsOpts configuration structure for persistent provider settings
Enhanced OpenAiEmbedding to support multiple providers with automatic authentication detection
Created comprehensive documentation for configuring different embeddings providers

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
docs/semantic-search.md	Updated with multi-provider configuration examples and guidance
docs/embeddings-providers.md	New comprehensive guide detailing all supported providers and configuration methods
bkmr/tests/test_base_path_config.rs	Added `embeddings_opts` field initialization to test configurations
bkmr/tests/infrastructure/mod.rs	Registered new embeddings test module
bkmr/tests/infrastructure/embeddings_tests.rs	New comprehensive test suite for embeddings configuration scenarios
bkmr/src/infrastructure/embeddings/openai_provider.rs	Enhanced to support multiple providers with automatic auth detection
bkmr/src/infrastructure/di/service_container.rs	Updated DI container to use config-based embedder initialization
bkmr/src/default_config.toml	Added embeddings configuration section with examples
bkmr/src/config.rs	Added `EmbeddingsOpts` struct and environment variable override logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-01T14:06:16Z

bkmr/tests/infrastructure/embeddings_tests.rs

+    assert!(embedding.is_some());
+
+    let vec = embedding.unwrap();
+    // OpenAI's text-embedding-ada-002 produces 1536-dimensional vectors


The comment references 'text-embedding-ada-002' but the code now defaults to 'text-embedding-3-small' (as stated in line 60 and elsewhere). This comment should be updated to reflect the current default model, though both models produce 1536-dimensional vectors.

Suggested change

// OpenAI's text-embedding-ada-002 produces 1536-dimensional vectors

// OpenAI's text-embedding-3-small produces 1536-dimensional vectors

bkmr/src/infrastructure/embeddings/openai_provider.rs

Co-authored-by: Copilot <[email protected]>

Initial plan

86c1966

Copilot AI assigned Copilot and danielbodnar Nov 1, 2025

Copilot started work on behalf of danielbodnar November 1, 2025 02:10 View session

Copilot AI and others added 2 commits November 1, 2025 02:23

Implement vendor-agnostic embeddings support

e5b41ff

Co-authored-by: danielbodnar <[email protected]>

Address code review feedback for embeddings implementation

b3ced7f

Co-authored-by: danielbodnar <[email protected]>

Copilot AI changed the title ~~[WIP] Enhance bkmr to support vendor agnostic embeddings~~ Add vendor-agnostic embeddings support via environment variables Nov 1, 2025

Copilot AI requested a review from danielbodnar November 1, 2025 02:37

Copilot finished work on behalf of danielbodnar November 1, 2025 02:37

Copilot started work on behalf of danielbodnar November 1, 2025 02:38 View session

Copilot AI changed the title ~~Add vendor-agnostic embeddings support via environment variables~~ Add vendor-agnostic embeddings support with smart authentication Nov 1, 2025

Copilot finished work on behalf of danielbodnar November 1, 2025 02:49

Copilot started work on behalf of danielbodnar November 1, 2025 02:53 View session

Copilot AI changed the title ~~Add vendor-agnostic embeddings support with smart authentication~~ Add vendor-agnostic embeddings support with config file and smart authentication Nov 1, 2025

Copilot finished work on behalf of danielbodnar November 1, 2025 03:05

danielbodnar marked this pull request as ready for review November 1, 2025 14:04

Copilot AI review requested due to automatic review settings November 1, 2025 14:04

Copilot AI reviewed Nov 1, 2025

View reviewed changes

danielbodnar approved these changes Nov 1, 2025

View reviewed changes

Update bkmr/src/infrastructure/embeddings/openai_provider.rs

8ec05ec

Co-authored-by: Copilot <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add vendor-agnostic embeddings support with config file and smart authentication #3

Add vendor-agnostic embeddings support with config file and smart authentication #3

Uh oh!

Copilot AI commented Nov 1, 2025 •

edited

Loading

Uh oh!

danielbodnar commented Nov 1, 2025

Uh oh!

danielbodnar commented Nov 1, 2025

Uh oh!

Copilot AI commented Nov 1, 2025

Vendor-Agnostic OpenAI-Compatible Embeddings System for BKMR

Uh oh!

danielbodnar commented Nov 1, 2025

Uh oh!

Copilot AI commented Nov 1, 2025 •

edited

Loading

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	// OpenAI's text-embedding-ada-002 produces 1536-dimensional vectors
	// OpenAI's text-embedding-3-small produces 1536-dimensional vectors

Add vendor-agnostic embeddings support with config file and smart authentication #3

Are you sure you want to change the base?

Add vendor-agnostic embeddings support with config file and smart authentication #3

Uh oh!

Conversation

Copilot AI commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Core Implementation

Configuration (Production-Ready)

Authentication Detection

Testing

Documentation

Usage

Configuration File (Recommended for Production)

Environment Variables (Testing/Overrides)

Requirements

Uh oh!

danielbodnar commented Nov 1, 2025

Vendor-Agnostic OpenAI-Compatible Embeddings System for BKMR

Context

Requirements

Must Have

Out of Scope

Configuration & Decisions

Provider Support Matrix

Offline Embedding Engines (Rust)

Architecture

Library Selection Criteria

Implementation Path A: Using Existing Library

Implementation Path B: Custom HTTP Client

Embeddings Ecosystem Research

Available Rust Crates

Cloud Embedding Providers & Models

Offline Embedding Engines (Rust-Native)

Implementation Decision Matrix

Implementation Approach

Step 0: Run Test Suite

Step 1: Evaluate ai Crate

Step 2: Custom Implementation (if needed)

Step 3: Offline Models (future phase)

Quality Criteria

Functional Requirements

Code Quality

Configuration

Documentation

Examples

Example 1: OpenAI (Default, Backwards Compatible)

Example 2: Voyage AI

Example 3: Ollama (Local)

Example 4: HuggingFace Inference API

Example 5: Custom OpenAI-Compatible Endpoint

Example 6: Error Handling

Common Pitfalls

Implementation Pitfalls

Configuration Pitfalls

Testing Pitfalls

Documentation Pitfalls

Security Pitfalls

Scope Boundaries

In Scope for This PR

Out of Scope (Future Work)

Implementation Checklist

References

Deliverables

Implementation Checklist

Phase 1: Research

Phase 2: Implementation

Phase 3: Testing

Phase 4: Documentation

Phase 5: Review

Uh oh!

danielbodnar commented Nov 1, 2025

Uh oh!

Copilot AI commented Nov 1, 2025

Vendor-Agnostic OpenAI-Compatible Embeddings System for BKMR

Uh oh!

danielbodnar commented Nov 1, 2025

Uh oh!

Copilot AI commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Copilot AI commented Nov 1, 2025 •

edited

Loading

Step 1: Evaluate `ai` Crate

Copilot AI commented Nov 1, 2025 •

edited

Loading