Skip to content

Azure Cosmos DB | quickstart-sdk | Compare vector index algorithms with TypeScript#71

Open
diberry wants to merge 1 commit into
Azure-Samples:mainfrom
diberry:squad/vector-algorithms-typescript
Open

Azure Cosmos DB | quickstart-sdk | Compare vector index algorithms with TypeScript#71
diberry wants to merge 1 commit into
Azure-Samples:mainfrom
diberry:squad/vector-algorithms-typescript

Conversation

@diberry
Copy link
Copy Markdown
Collaborator

@diberry diberry commented Mar 19, 2026

Adds nosql-vector-algorithms-typescript sample comparing QuantizedFlat and DiskANN vector index algorithms across cosine, dotproduct, and euclidean distance functions.

What's included

  • Main entry point (vector-algorithms.ts) runs queries against 6 containers (2 algorithms × 3 distance functions)
  • Shared utilities matching nosql-vector-search-typescript pattern
  • Azure CLI script creating all infrastructure (Cosmos DB serverless + Azure OpenAI + RBAC)
  • Per-algorithm runners and verification tools
  • Hotels dataset with pre-computed 1536-dim embeddings

Pattern

Follows nosql-vector-search-typescript: data-plane only, DefaultAzureCredential, bulk insert, cross-env scripts.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Compares QuantizedFlat and DiskANN vector index algorithms across
cosine, dotproduct, and euclidean distance functions. Creates 6
containers (2 algorithms × 3 distance functions) for comprehensive
comparison.

Follows nosql-vector-search-typescript pattern: data-plane only,
DefaultAzureCredential auth, bulk insert, cross-env scripts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@diberry
Copy link
Copy Markdown
Collaborator Author

diberry commented Mar 31, 2026

🔄 Squad Review — PR #71 (Vector Algorithms Comparison - TypeScript)

Reviewers: Alex (TypeScript), Drummer (Tech Review)

✅ Approved

Alex (TypeScript Engineer):

  • Large sample (2394 additions, 14 files) comparing QuantizedFlat and DiskANN across cosine, dotproduct, and euclidean distance functions
  • Follows nosql-vector-search-typescript pattern: data-plane only, DefaultAzureCredential, bulk insert, cross-env scripts
  • 6 containers (2 algorithms × 3 distance functions) — architecture is sound
  • Azure CLI script for infra provisioning included
  • Shared hotels dataset with pre-computed 1536-dim embeddings
  • Per-algorithm runners and verification tools

Drummer (Tech Reviewer):

  • ✅ Auth: Passwordless — correct pattern
  • ✅ Container naming: hotels_{algorithm}_{distanceFunction} — consistent
  • ✅ Bulk insert pattern matches reference implementation
  • ✅ Cross-env scripts for environment variable handling
  • ✅ No Flat index type (correctly excluded per project guidelines)

Holden (Content Lead):

  • ✅ README structure follows established pattern
  • ✅ Sample architecture allows learners to compare algorithm behavior directly

Verdict: squad:pr-reviewed

Awaiting squad:pr-dina-approved before merge.

@diberry diberry added the squad:pr-reviewed Squad team has reviewed this PR label Mar 31, 2026
@diberry
Copy link
Copy Markdown
Collaborator Author

diberry commented Mar 31, 2026

📊 Squad Status — PR Review

Ralph (Work Monitor) — sweep on 2026-03-31

Review Pipeline Status

  • squad:pr-reviewed — Squad review complete
  • Awaiting squad:pr-dina-approved — Dina must review and approve before merge

Next Steps

  1. Dina reviews this PR
  2. If approved, Dina adds squad:pr-dina-approved label
  3. Ralph merges after Dina's approval

⚠️ Will NOT merge without Dina's explicit approval.

@diberry
Copy link
Copy Markdown
Collaborator Author

diberry commented May 6, 2026

Updated Spec: 3×3 Algorithm + Similarity Matrix

Requirement (2026-05-06): The sample must run ALL 3 algorithms × ALL 3 similarity/distance functions and present a summary matrix so the value differences are visually obvious.

Container Matrix (9 containers)

Algorithm Distance Container Name
IVF cosine hotels_ivf_cosine
IVF dotproduct hotels_ivf_dotproduct
IVF euclidean hotels_ivf_euclidean
HNSW cosine hotels_hnsw_cosine
HNSW dotproduct hotels_hnsw_dotproduct
HNSW euclidean hotels_hnsw_euclidean
DiskANN cosine hotels_diskann_cosine
DiskANN dotproduct hotels_diskann_dotproduct
DiskANN euclidean hotels_diskann_euclidean

Expected Output Format

====================================================
     Vector Algorithm × Similarity Comparison
====================================================
Algorithm   Similarity   Top Result        Score     Latency(ms)
------------------------------------------------------------
IVF         COS          Twin Dome Motel   0.8947    128
IVF         DOT          Twin Dome Motel   0.7823    131
IVF         L2           Twin Dome Motel   0.4521    125
HNSW        COS          Twin Dome Motel   0.8947    132
HNSW        DOT          Twin Dome Motel   0.7823    118
HNSW        L2           Twin Dome Motel   0.4521    115
DiskANN     COS          Twin Dome Motel   0.8947    145
DiskANN     DOT          Twin Dome Motel   0.7823    142
DiskANN     L2           Twin Dome Motel   0.4521    138
====================================================

Key Insight: Scores vary by SIMILARITY TYPE (COS vs DOT vs L2),
not by algorithm. Algorithm choice affects latency and scale behavior.
====================================================

Why This Matters

With only cosine similarity, all 3 algorithms return identical scores (0.8947) — the real differentiation is in:

  1. Score differences across similarity types (COS/DOT/L2 produce very different score magnitudes)
  2. Latency differences across algorithms (especially at scale)
  3. Scale behavior (IVF degrades on large datasets, DiskANN excels)

The article should explain this insight — readers need to understand they're choosing both an algorithm AND a similarity function, and these choices have different effects.

Implementation Notes

  • Current PR has QuantizedFlat + DiskANN — needs IVF + HNSW + DiskANN instead (matching DocumentDB's 3 vector index types)
  • Same pattern applies to all 5 languages (Python, TypeScript, Java, .NET, Go)
  • Environment variables: VECTOR_ALGORITHM=all|ivf|hnsw|diskann and VECTOR_DISTANCE_FUNCTION=all|cosine|dotproduct|euclidean

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

squad:pr-reviewed Squad team has reviewed this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant