DataJoint 2.0 Documentation #97

dimitri-yatsenko · 2026-01-08T19:34:42Z

Overview

This PR delivers the complete DataJoint 2.0 documentation, restructured following the Diátaxis framework for technical documentation. The documentation accompanies the DataJoint 2.0 release (datajoint-python).

Documentation Structure

Tutorials (Learning-oriented)

12 executable Jupyter notebooks organized by complexity:

Core Tutorials:

Tutorial	Topic
01-getting-started	Connection, schema creation, first tables
02-schema-design	Entity relationships, foreign keys, dependencies
03-data-entry	Inserting data, transactions, immutability
04-queries	Query algebra, restrictions, projections, joins
05-computation	AutoPopulate, make methods, error handling
06-object-storage	External storage, `<object@>`, large data

Examples:

Tutorial	Topic
University	Real-world university database example
Fractal Pipeline	Scientific pipeline with computation DAG
Blob Detection	Image processing pipeline
Hotel Reservations	Booking system with date ranges
Languages	Many-to-many relationships

Domain-Specific:

Tutorial	Topic
Calcium Imaging	Two-photon imaging pipeline
Electrophysiology	Neural recordings pipeline
Ephys with Object Storage	NpyCodec for large arrays
Allen CCF	Brain atlas integration

Advanced:

Tutorial	Topic
SQL Comparison	DataJoint vs SQL syntax
JSON Type	JSONB attribute usage

All notebooks are tested with pytest --nbmake.

How-To Guides (Task-oriented)

20+ practical guides for common tasks:

Setup: installation, configure-database, configure-storage
Schema Design: define-tables, design-primary-keys, model-relationships, alter-tables
Data Operations: insert-data, fetch-results, delete-data, query-data
Computation: run-computations, distributed-computing, monitor-progress
Advanced: create-custom-codec, manage-large-data, use-object-storage, staged-insert, handle-errors
Project Management: manage-pipeline-project, backup-restore
Migration: Comprehensive AI-assisted migration guide from 0.x to 2.0

Concepts (Explanation-oriented)

Conceptual documentation organized by topic:

Foundations:

What is DataJoint
Data Pipelines
Data Integrity
Normalization
History

Schema Design:

Entity-Relationship Model
Schema Dimensions
Reading Diagrams

Query System:

Query Operators
Semantic Matching

Data Management:

Object-Augmented Schemas
Custom Codecs
What's New in 2.0
FAQ

Reference (Information-oriented)

Specifications (15 documents):

Spec	Content
table-declaration	Table definition syntax, attribute types, core types
query-algebra	Query operators, semantic matching, SQL transpilation
data-manipulation	Insert, delete, update operations
autopopulate	Populate, make methods, job management
master-part	Part tables, integrity constraints
object-storage	External stores, hash/schema-addressed storage
virtual-schemas	spawn_missing_classes, make_classes
migration	0.x to 2.0 migration phases
errors	Error types and handling
definition-syntax	Complete BNF grammar
operators	Query operator reference
configuration	Config hierarchy, environment variables
url-representation	Table URL format

API Documentation:

Auto-generated from datajoint-python docstrings via mkdocstrings
Covers all public classes and methods

Additional Content

Elements:

DataJoint Elements overview with NIH U24 background
Links to individual Element repositories

Publications:

Comprehensive list of papers using DataJoint (2014-2025)
50+ peer-reviewed publications

About:

Citation guidelines
History
License (CC BY 4.0 for docs, Apache 2.0 for code)

Key Features

Diátaxis Framework

All content classified into exactly one category:

Tutorials: Learning-oriented, step-by-step
How-To: Task-oriented, goal-focused
Concepts: Understanding-oriented, explanatory
Reference: Information-oriented, accurate

DataJoint 2.0 API

All examples use the new 2.0 API:

Modern fetch API (no deprecated .fetch() method)
Core types (int32, float64, varchar, uuid, json)
Codec syntax (<blob>, <blob@store>, <npy@store>, <object@store>)
dj.Top for single-row lookups
staged_insert1 for direct object storage writes
Semantic matching with exclude_nonmatching parameter

Executable Examples

All tutorials are Jupyter notebooks
Tested with pytest --nbmake
Include expected outputs
Use Docker Compose for database setup

Migration Guide

Comprehensive 7-phase migration from 0.x to 2.0:

Code migration (API changes)
Type annotations (core types)
Surrogate key annotations
Lineage table creation
Foreign key conversion
External storage migration
AdaptedTypes → Codecs

Includes AI-assisted migration prompts and safety checks.

Visual Documentation

ER diagrams generated with mermaid-cli
Pipeline diagrams for domain tutorials
Consistent diagram notation documented

Technical Changes

Dependencies Simplified

Removed unused packages from pip_requirements.txt:

mike (version provider removed)
mkdocs-redirects (not configured)
mkdocs-pymdownx-material-extras (not used)
nbconvert (transitive dependency)

Navigation Reorganized

Elements moved under Reference
Concepts organized into logical sections
Specs organized by topic (Schema, Queries, Data, Storage, etc.)

License Updated

Documentation: CC BY 4.0
Code examples: Apache 2.0 (consistent with datajoint-python)

Commits Summary

This PR includes 100+ commits covering:

Complete Diátaxis restructure
15 specification documents
12+ executable tutorials
20+ how-to guides
Comprehensive concept explanations
API documentation pipeline
Migration guides
Visual diagram documentation
Publications list (50+ papers)
Elements with NIH U24 background

- Create new directory structure: explanation/, tutorials/, how-to/, reference/specs/, api/ - Add index pages for each section with content outlines - Update mkdocs.yaml with new navigation (removed partnerships/publications) - Add mkdocs-jupyter for notebook support - Update README with comprehensive project description - Add about/index.md and about/contributing.md - Update license references to Apache 2.0 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Migrated spec documents: - primary-keys.md - Primary key rules in query operators - semantic-matching.md - Attribute lineage and join compatibility - type-system.md - Three-layer type architecture - codec-api.md - Custom codec implementation - fetch-api.md - Data retrieval methods - autopopulate.md - Jobs 2.0 specification - job-metadata.md - Hidden job tracking columns Updated specs/index.md with proper categorization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Created explanation pages based on datajoint-book concepts: - relational-workflow-model.md - Core paradigm, three approaches compared - entity-integrity.md - Primary keys, three questions framework - normalization.md - Workflow normalization principle - query-algebra.md - Five operators with examples - type-system.md - Three-layer architecture, codecs - computation-model.md - AutoPopulate, Jobs 2.0 Updated explanation/index.md with grid card layout. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Added explanation/custom-codecs.md covering codec extensibility - Updated TERMINOLOGY.md with codec extensibility terms - Updated mkdocs.yaml navigation - Updated explanation/index.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Added mkdocstrings, gen-files, literate-nav plugins - Created scripts/gen_api_pages.py for auto-generating API docs - Updated mkdocs.yaml with API generation configuration - Created reference pages: configuration.md, definition-syntax.md, errors.md - Updated api/index.md with module links - Added pip requirements for doc generation API docs are auto-generated from datajoint-python/src docstrings using NumPy-style format. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Archive elements/ (to be documented separately) - Archive partnerships/ and projects/ (handled elsewhere) - Archive support-events.md and additional-resources.md - Remove redundant about/ files (about.md, contribute.md, datajoint-team.md) - Update index.md to remove Elements reference - Update nav to remove Elements section 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Comprehensive spec covering: - Table tiers and class structure - Definition string grammar - Attribute types (core, string, temporal, codec) - Default values and nullable attributes - Foreign key references and options - Index declarations - Part tables - Auto-populated tables - Validation rules - SQL generation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Comprehensive spec covering all query operators: - Restriction (& and -): condition types, semantic matching - Projection (.proj): selection, renaming, computed attributes - Join (*): functional dependencies, PK determination, left join - Aggregation (.aggr): grouping, aggregate functions, HAVING - Extension (.extend): left join with A→B requirement - Union (+): combining entity sets, PK requirements - Universal sets (dj.U): unique values, global aggregation Also covers: - Semantic matching rules and lineage - Operator precedence - Subquery generation rules - Quick reference table 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Comprehensive spec covering insert, update1, and delete operations: - Workflow normalization philosophy: insert/delete as primary ops - Updates as surgical corrections (update1 only, by design) - The recomputation pattern for data corrections Insert operations: - insert() with all parameters and input formats - insert1() convenience method - staged_insert1 for large objects (Zarr, HDF5) - Handling duplicates, extra fields, auto-populated tables Update operations: - update1() requirements and constraints - When to use vs when to delete/reinsert - Why no bulk update (by design) Delete operations: - Cascade behavior to dependent tables - Safe mode and transaction control - Part table constraints - delete_quick() for internal use Also covers validation, transactions, error handling, best practices. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Restructured to present DataJoint 2.0 as the status quo: - Starts with fundamentals: table types, make() method, key_source - Explains populate() method and operating modes - Describes per-table jobs system as native feature - Covers priority, scheduling, distributed computing - Migration from 1.x moved to brief section at end Removed problem/solution framing that assumed 1.x knowledge. Now readable as standalone 2.0 documentation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Added comprehensive coverage of: - Key source calculation: automatic derivation from FK joins, custom key sources - The populate process: execution flow, direct mode behavior, return values - The make() method: basic pattern, requirements, tripartite make (generator and method-based) - Transaction management: automatic transactions, atomicity, scope diagrams - Part tables: computed results with parts, transaction behavior, cascading deletes - Progress monitoring: progress() method, display_progress parameter - Direct vs distributed mode comparison Reorganized to present basic populate first, job reservation as an extension. Tripartite make pattern documented with both generator and method approaches. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Tutorials: - 01-getting-started: Blob detection pipeline example - 02-schema-design: Table tiers, keys, relationships, core types - 03-data-entry: Insert, update, delete operations - 04-queries: Restriction, projection, join, aggregation, fetch - 05-computation: Computed tables, make(), populate() Updates: - Home page: Relational Workflow Model explanation - Type system: Core types vs native types distinction - Schema design: Master-part relationships, compositional integrity - All tutorials use DataJoint 2.0 API (to_arrays, to_dicts, keys) - Dates updated to January 2026 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Explain OAS: unified architecture for relational + object storage - Clarify "object" terminology (data objects, not OOP) - Emphasize that object storage is managed with same rigor as database - List key OAS features: transparent access, lifecycle, deduplication - Update Quick Start dates to 2026 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Remove replace=True example, add caveat about breaking immutability - Introduce master-part with transactions for compositional integrity - Explain auto-populated tables enforce transactions automatically - Manual tables need explicit transactions for master-part inserts - All session+trial inserts now use transactions - Update best practices to emphasize transaction usage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Fixes: - 02-schema-design: Add task_params=None for consistent field sets - 03-data-entry: Fix to_arrays() usage for single column - 05-computation: Cast numpy bool to Python bool for is_fast All 5 tutorials now execute successfully with outputs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

API updates: - Replace safemode parameter with prompt in delete() - Remove download_path from fetch methods (use config.override instead) - Update fetch-api spec with config-based download path All tutorials re-executed and pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

The fetch module was removed in modern-fetch-api merge. Fetch methods are now on QueryExpression directly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

These terms are misnomers - they are restriction operations, not joins. Replaced with: - "Restriction by Query Expression" - "restriction" / "anti-restriction" Added reference to semantic matching spec for attribute matching. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Explain that semantic matching prevents accidental matches on unrelated attributes that happen to share names. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Replace keep_all_rows with exclude_nonmatching (inverted logic) - Default behavior now keeps all rows (LEFT JOIN) - Update query-algebra.md and primary-keys.md specs - Expand queries tutorial with: - Join primary key determination via functional dependencies - Entity-to-entity aggregation concept - Extension operator (.extend()) - Universal set (dj.U()) for ad-hoc groupings 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Explain default behavior keeps all entities (even without matches) - Show count(pk_attr) vs count(*) for correct zero counts - Add exclude_nonmatching=True example for filtering 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Clarify that prompt default is determined by config['safemode'] - Not hardcoded to True or "interactive mode" - Update best practices section accordingly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

User-friendly reference covering all query operators: - Restriction (&) and anti-restriction (-) - Projection (.proj()) - Join (*) - Extension (.extend()) - Aggregation (.aggr()) - Union (+) - Universal set (dj.U()) - Operator precedence - Semantic matching explanation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

entity-integrity.md: - Fix surrogate key definition: used inside database, not exposed to users - Replace auto_increment with UUID (no auto-increment in DataJoint) - Update all examples to use core DataJoint types (uint32, float32, etc.) - Use <blob> for blob storage type - Use datetime(3) for millisecond, datetime(6) for microsecond precision computation-model.md: - Add three-part make model for long-running computations - Explain make_fetch, make_compute, make_insert pattern - Document re-fetch verification for referential integrity - Explain when to use standard vs three-part make - Fix int to uint32 in Segmentation example 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

- installation.md — Install DataJoint and set up environment - configure-database.md — Database connection with secrets separation - define-tables.md — Table definitions with core DataJoint types - insert-data.md — Insert patterns including transactions - query-data.md — Query operators quick reference - fetch-results.md — Output methods and formats - run-computations.md — populate() and three-part make 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Remove unnecessary int() and bool() wrappers around boolean values now that datajoint-python properly handles np.bool_ types. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Update LICENSE from MIT to Apache 2.0 with copyright: Copyright 2014-2026 DataJoint Inc. and contributors 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Tutorials: - Add tutorial 06: Object Storage (externals, attachments, file stores) - Add advanced tutorials: custom codecs, distributed computing, migration - Fix distributed.ipynb multiprocessing demo (explain module requirement) - Minor updates to tutorials 01-03 for consistency How-to guides: - Add 14 new task-oriented guides covering common operations - Expand index with full guide listing Explanation: - Expand entity integrity section Config: - Update mkdocs.yaml navigation for new content - Add new images for documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

@Schema

installation.md: - Change mysql-connector-python to pymysql - Update Python requirement to 3.10+ - Add DataJoint.com as recommended managed service define-tables.md: - Add Schema creation explanation - Separate core types from built-in codecs - Add json as core type (no angle brackets) - Document built-in codecs: blob, attach, object@store - Move indexes to end of definition examples - Clarify tables declared at @Schema decorator time - Add schema.drop() and table.drop() for prototyping - Use uint16 instead of int in examples configure-database.md: - Remove untested multiple connections section - Add DataJoint.com tip configure-storage.md: - Add DataJoint.com tip for pre-configured storage backup-restore.md: - Add DataJoint.com tip for automatic backups 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Updated mkdocs.yaml navigation: - Changed 'Migrate from 0.x' to 'Migrate to 2.0' - Points to new parallel schema migration guide (migrate-to-v20.md) The new guide provides a safer migration approach: - Zero production risk during testing - Unlimited practice runs in _v20 schemas - Easy rollback at every phase - Side-by-side validation Old in-place migration guide (migrate-from-0x.md) remains in repo but is no longer linked in navigation. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Major revision of migration guide to use standard git workflow: **Key Changes:** 1. Git branch approach: - Pin DataJoint 0.14.6 on main branch - Create migrate-to-v2 branch for DataJoint 2.0 - Use _v2 suffix for parallel schemas 2. Agentic code migration (~1 hour): - Detailed AI agent prompt for automated migration - Schema declarations, fetch API, type syntax - Defers external storage migration to Phase 2 3. Flexible data approach: - Option A: Fresh data for fast testing - Option B: Copy production data with pointer migration 4. Simpler cutover: - Merge branch when ready - Rename schemas or keep _v2 suffix - Standard git revert for rollback **Advantages over previous plan:** - Standard git workflow (familiar to developers) - AI-assisted migration saves hours - External storage deferred (optional) - Easy rollback (git checkout main) - No production risk during testing Timeline: Small pipeline ~2 days, medium ~1 week (vs ~3-6 weeks before) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

- Added detailed requirements section (Python 3.10+, MySQL 8.0+, license change) - Documented What's New in 2.0 (3-tier type system, codecs, unified stores) - Organized into 4 clear phases with detailed timelines - Phase I: Branch and code migration (~1-4 hours with AI assistance) - Pin legacy on main branch - Create pre/v2.0 migration branch - Configure DataJoint 2.0 and object storage - Convert table definitions with AI prompt - Convert query/insert code with AI prompt - Convert populate methods with AI prompt - Phase II: Test with sample data (~1-2 days) - Phase III: Migrate production data (~1-7 days, 3 options) - Option A: Copy and rename (recommended) - Option B: In-place migration - Option C: Gradual with legacy compatibility - Phase IV: Adopt new features (ongoing) - Emphasized key principles: - Production runs undisturbed through Phase II - Git branch workflow for safety - External storage deferred to Phase III - Agentic (AI-assisted) migration reduces time from weeks to hours - Added comprehensive examples, troubleshooting, and cross-references - Total timeline reduced from 3-6 weeks to 1-2 weeks

… III Critical corrections based on user feedback: Phase I changes: - Convert ALL codecs including external storage (blob@, attach@, filepath@) - Use TEST stores for development - External storage CODE implemented in Phase I - Only DATA migration deferred to Phase III Phase II changes: - Rename to "Test Compatibility and Equivalence" - Add Step 5: Compare with Legacy Schema - Emphasize side-by-side testing of legacy vs v2 - Validate that results are equivalent before touching production Phase III changes: - Emphasize this is DATA migration only (code complete from Phase I) - Add Step 0: Configure Production Stores - Clarify external storage metadata migration (UUID → JSON) - No file copying needed (keep in place) Key principle clarified throughout: - Phase I: ALL code changes (using test stores) - Phase II: Equivalence testing - Phase III: Production data migration only

User feedback: avoid 'external' since it's all integrated Terminology changes throughout migration guide: - 'external storage' → 'in-store codecs' or 'in-store' - 'External Storage Codecs' → 'In-Store Codecs' - Storage column label: 'External' → 'In-store' - Consistently use 'in-table' vs 'in-store' distinction Key concept clarification: - In-table: Data serialized into MySQL table (<blob>, <attach>) - In-store: Data stored in object stores (<blob@>, <npy@>, <filepath@>) Both are integrated into DataJoint - 'external' implied separation which doesn't reflect the unified architecture of DataJoint 2.0

Schema-addressed storage corrections: - <npy@> and <object@> are NEW in 2.0 (not migration targets) - Updated codec table to show 'New in 2.0' for schema-addressed types - Clarified these are adopted in Phase IV, not migrated in Phase I - Updated AI agent prompt to distinguish legacy vs new codecs - Removed suggestion to migrate to <npy@> in example - Added optional Phase IV adoption example Bullet list formatting fixes: - Add blank lines before all bullet lists for proper markdown rendering - Fixed 'Key principles', 'Timeline', 'End state', 'Prerequisites', 'Options', 'Advantages', 'What this does' sections - Ensures consistent rendering across markdown parsers

Critical corrections based on user feedback: 1. Codecs Section (What's New): - Split into 'Migration: Legacy → 2.0' and 'New in 2.0' sections - Clarified 0.14.x had IMPLICIT serialization (longblob auto-serialized) - 2.0 makes this EXPLICIT with <blob> codec - Added mediumblob → <blob> conversion (was missing) - Emphasized <npy@> and <object@> are NEW features, not migration targets 2. Phase I Step 4 (Configure Stores): - Added 'Skip this step if' for pipelines without legacy in-store - Only list LEGACY in-store formats (external-store, blob@store, etc.) - Removed <npy@> and <object@> from 'things to configure' - Added background explaining 0.14.x implicit vs 2.0 explicit codecs 3. AI Agent Prompt Updates: - Changed scope to 'Convert ALL legacy codecs' (not just 'all codecs') - Added explicit instruction: 'Do NOT add new 2.0 codecs' - Separated legacy in-store codecs from new 2.0 codecs - Added warning: 'IMPORTANT - Do NOT use these during migration' - Clarified these have NO legacy equivalent 4. Example Code: - Removed confusing RecordingEnhanced example with <npy@> - Added clear comment: 'Only convert existing legacy formats' - Noted Phase IV adoption is separate from migration Key insight: 0.14.x did NOT have an explicit codec system. Types like 'longblob' automatically serialized Python objects. 2.0 makes this explicit, but <npy@> and <object@> are entirely NEW capabilities.

User correction: There was NO 'external-store' type in legacy DataJoint. Legacy in-store types were: - blob@store (hash-addressed blobs) - attach@store (hash-addressed attachments) - filepath@store (filepath references) Changes: - Removed 'external-store' from codec migration table - Removed 'external-raw' from examples (used 'blob@raw' instead) - Updated Step 4 to list only actual legacy types - Fixed AI agent prompt to remove external-store conversion - Updated example code to show correct 0.14.x syntax (blob@raw, not external-raw) - Fixed Phase III migration helper calls

User feedback: No need to distinguish 0.13 vs 0.14 family versions. Changes: - Replaced all '0.14.x' references with 'pre-2.0' - Replaced all '0.14.6' references with 'pre-2.0' or 'legacy' - Updated pip install example to use 'datajoint<2.0.0' (valid version constraint) - Kept 'legacy' where it reads better contextually This simplifies the guide and avoids confusion about which specific pre-2.0 version the user might be on.

…ime, timestamp) Added detailed guidance for special core types to match migrate-from-0x.md: Background Section (User-facing): - Split type conversions into clear categories: - Integer and Float Types - String, Date, and Structured Types - Codecs - Added table for string/date/structured types with notes - Included json, uuid, enum, datetime, timestamp, tinyint(1) - Added important notes explaining: - Datetime/Timestamp: UTC-only in 2.0, convert timestamp → datetime - JSON: New core type, optional adoption - Enum: Already a core type, no changes needed AI Agent Prompt (Detailed Instructions): - Organized core types into logical groups - Added 'Core Types (String and Date)' section - Added 'Core Types (Structured Data)' for json and uuid - Added 'Special Cases' for tinyint(1) and timestamp - Included detailed 'IMPORTANT' sections for: - Datetime and Timestamp (UTC-only, conversion from timestamp) - Enum Types (no changes required) - JSON Type (optional adoption for JSON-in-blob migrations) - Provided examples for each special case This matches the level of detail in migrate-from-0x.md and ensures AI agents properly handle these types during migration.

- Add detailed bool vs uint8 guidance (tinyint(1) ambiguity) - Emphasize UTC-only datetime standard in DataJoint 2.0 - Clarify timezones handled by frontend, not database - Fix json/uuid status (both existed in pre-2.0) - Expand AI agent prompt with specific examples - Add conversion decision trees for ambiguous types

- Legacy supported bool and boolean types (MySQL stores as tinyint(1)) - Only explicit tinyint(1) declarations need review - Distinguish between bool (already present) vs tinyint(1) (ambiguous) - Update table to show bool/boolean as unchanged - Clarify in AI prompt: only tinyint(1) needs user decision

…one handling - Add utf8mb4/utf8mb4_bin as server-wide requirements in system requirements table - Explain character encoding is infrastructure configuration (like timezones) - Clarify timezones handled by 'application front-ends and client APIs', not just 'frontend' - Emphasize 'database stores UTC' throughout - Update all timezone references for consistency

…re types - Remove text and time from core types list - Add text and time as native types (discouraged) - text: recommend varchar(n) migration, or keep as native - time: no core equivalent, keep as native if needed - Add 'Core vs Native Types' explanation in Important Notes - Update AI agent prompt with native types guidance - Clarify json is a core type (was incorrectly called 'native') - Add warnings that native types will generate warnings in 2.0

…r time type Timestamp changes: - ASK USER about timezone convention (don't assume UTC) - Provide specific questions about timezone and MySQL auto-update behavior - Invite adoption of UTC throughout pipeline - Add example conversation showing interactive approach - Recommend adding data conversion script to Phase III if needed Time type changes: - Recommend migrating time → datetime (core type) - Ask user if date is also relevant before recommending datetime - Allow keeping time as native type if only time-of-day needed - Update AI agent prompt with interactive approach for both types This ensures users understand their timezone conventions and make deliberate decisions about conversion rather than automatic assumptions.

Fixed multiple instances where bullet lists immediately followed section headers without blank lines, which breaks markdown rendering. Affected sections: - Conversion rules (datetime/timestamp and bool) - 'Only explicit tinyint(1) declarations need review because:' - 'For text:' and 'For time:' native type guidance - CONTEXT, SCOPE, VERIFICATION, REPORT sections in AI prompts - CONVERSIONS NEEDED section

@ operator changes: - OLD: table1 @ table2 → join(table2, semantic_check=False) - NEW: table1 @ table2 → table1 * table2 (WITH semantic checks) - IMPORTANT: @ bypassed semantic checks; * enables them by default - If semantic checks fail, INVESTIGATE—may reveal schema/data errors - Add guidance for .join(x, left=True) → .extend(x) fetch API changes: - Add: table.fetch1('KEY') → table.keys() - Add: table.fetch('KEY', 'a', 'b') → table.to_arrays('a', 'b', include_key=True) - Update all examples and patterns - Update VERIFICATION and REPORT sections - Fix validation script example to use keys() Rationale: The @ operator was a special case that bypassed semantic checks. DataJoint 2.0 enables semantic checks by default with *, which helps users discover schema errors during migration.

fetch API additions: - Add: fetch(..., format='frame') → to_pandas() - Add pattern example for pandas DataFrame conversion dj.U() pattern removal: - OLD: dj.U('attr') * table → dj.U('attr') & table - NEW: dj.U('attr') * table → table (no longer necessary) - Updated all references: table, background, AI prompt, patterns, REPORT - Pattern 8 renamed to 'Universal set (REMOVE)' ERD deprecation: - Add: dj.ERD(schema) → dj.Diagram(schema) - ERD is deprecated in DataJoint 2.0 - Added to API comparison table and background section Checklist updates: - Add fetch(..., format='frame') check - Add fetch1('KEY') check - Add dj.U() * table removal check - Add dj.ERD() conversion check - Renumber pattern examples (was duplicate Pattern 5)

CORRECTION: Previous commit incorrectly stated dj.U() * table should be removed entirely. This was wrong. Correct understanding: - dj.U('attr') & table → CORRECT pattern, remains unchanged Used to project specific attributes (e.g., all unique dates) Example: all_dates = dj.U('session_date') & Session - dj.U('attr') * table → HACK pattern, needs refactoring Was used to magically change primary key of table Should be flagged and user asked to refactor Changes: - Add both patterns to API comparison table - Split into separate 'Universal Set' section in background - Update AI agent prompt to distinguish correct from hack - Update PROCESS to 'identify as hack, ask user to refactor' - Update VERIFICATION to check both patterns separately - Update Pattern 10 to show both correct and hack examples - Update REPORT to count both patterns separately - Update commit message format - Update Phase I checklist This ensures users understand: 1. dj.U() & table is correct and should remain 2. dj.U() * table was a hack and needs attention

dimitri-yatsenko and others added 30 commits January 4, 2026 13:59

docs: Remove datajoint.fetch from API docs

c177520

The fetch module was removed in modern-fetch-api merge. Fetch methods are now on QueryExpression directly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

docs: Expand semantic matching explanation in tutorial

a39ab9c

Explain that semantic matching prevents accidental matches on unrelated attributes that happen to share names. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

docs: Add extend operator to queries tutorial quick reference

c478028

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

dimitri-yatsenko and others added 20 commits January 14, 2026 02:07

fix: correct object storage tutorial link extension (.ipynb not .md)

7e8b1d3

dimitri-yatsenko requested a review from kavenk January 14, 2026 18:18

dimitri-yatsenko assigned dimitri-yatsenko and kavenk Jan 14, 2026

kavenk approved these changes Jan 14, 2026

View reviewed changes

dimitri-yatsenko requested review from MilagrosMarin January 14, 2026 18:30

MilagrosMarin approved these changes Jan 14, 2026

View reviewed changes

dimitri-yatsenko merged commit 0dc461f into main Jan 14, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DataJoint 2.0 Documentation #97

DataJoint 2.0 Documentation #97

Uh oh!

dimitri-yatsenko commented Jan 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DataJoint 2.0 Documentation #97

DataJoint 2.0 Documentation #97

Uh oh!

Conversation

dimitri-yatsenko commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Documentation Structure

Tutorials (Learning-oriented)

How-To Guides (Task-oriented)

Concepts (Explanation-oriented)

Reference (Information-oriented)

Additional Content

Key Features

Diátaxis Framework

DataJoint 2.0 API

Executable Examples

Migration Guide

Visual Documentation

Technical Changes

Dependencies Simplified

Navigation Reorganized

License Updated

Commits Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dimitri-yatsenko commented Jan 8, 2026 •

edited

Loading