-
Notifications
You must be signed in to change notification settings - Fork 11
DataJoint 2.0 Documentation #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Create new directory structure: explanation/, tutorials/, how-to/, reference/specs/, api/ - Add index pages for each section with content outlines - Update mkdocs.yaml with new navigation (removed partnerships/publications) - Add mkdocs-jupyter for notebook support - Update README with comprehensive project description - Add about/index.md and about/contributing.md - Update license references to Apache 2.0 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Migrated spec documents: - primary-keys.md - Primary key rules in query operators - semantic-matching.md - Attribute lineage and join compatibility - type-system.md - Three-layer type architecture - codec-api.md - Custom codec implementation - fetch-api.md - Data retrieval methods - autopopulate.md - Jobs 2.0 specification - job-metadata.md - Hidden job tracking columns Updated specs/index.md with proper categorization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Created explanation pages based on datajoint-book concepts: - relational-workflow-model.md - Core paradigm, three approaches compared - entity-integrity.md - Primary keys, three questions framework - normalization.md - Workflow normalization principle - query-algebra.md - Five operators with examples - type-system.md - Three-layer architecture, codecs - computation-model.md - AutoPopulate, Jobs 2.0 Updated explanation/index.md with grid card layout. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Added explanation/custom-codecs.md covering codec extensibility - Updated TERMINOLOGY.md with codec extensibility terms - Updated mkdocs.yaml navigation - Updated explanation/index.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Added mkdocstrings, gen-files, literate-nav plugins - Created scripts/gen_api_pages.py for auto-generating API docs - Updated mkdocs.yaml with API generation configuration - Created reference pages: configuration.md, definition-syntax.md, errors.md - Updated api/index.md with module links - Added pip requirements for doc generation API docs are auto-generated from datajoint-python/src docstrings using NumPy-style format. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Archive elements/ (to be documented separately) - Archive partnerships/ and projects/ (handled elsewhere) - Archive support-events.md and additional-resources.md - Remove redundant about/ files (about.md, contribute.md, datajoint-team.md) - Update index.md to remove Elements reference - Update nav to remove Elements section 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Comprehensive spec covering: - Table tiers and class structure - Definition string grammar - Attribute types (core, string, temporal, codec) - Default values and nullable attributes - Foreign key references and options - Index declarations - Part tables - Auto-populated tables - Validation rules - SQL generation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Comprehensive spec covering all query operators: - Restriction (& and -): condition types, semantic matching - Projection (.proj): selection, renaming, computed attributes - Join (*): functional dependencies, PK determination, left join - Aggregation (.aggr): grouping, aggregate functions, HAVING - Extension (.extend): left join with A→B requirement - Union (+): combining entity sets, PK requirements - Universal sets (dj.U): unique values, global aggregation Also covers: - Semantic matching rules and lineage - Operator precedence - Subquery generation rules - Quick reference table 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Comprehensive spec covering insert, update1, and delete operations: - Workflow normalization philosophy: insert/delete as primary ops - Updates as surgical corrections (update1 only, by design) - The recomputation pattern for data corrections Insert operations: - insert() with all parameters and input formats - insert1() convenience method - staged_insert1 for large objects (Zarr, HDF5) - Handling duplicates, extra fields, auto-populated tables Update operations: - update1() requirements and constraints - When to use vs when to delete/reinsert - Why no bulk update (by design) Delete operations: - Cascade behavior to dependent tables - Safe mode and transaction control - Part table constraints - delete_quick() for internal use Also covers validation, transactions, error handling, best practices. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Restructured to present DataJoint 2.0 as the status quo: - Starts with fundamentals: table types, make() method, key_source - Explains populate() method and operating modes - Describes per-table jobs system as native feature - Covers priority, scheduling, distributed computing - Migration from 1.x moved to brief section at end Removed problem/solution framing that assumed 1.x knowledge. Now readable as standalone 2.0 documentation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Added comprehensive coverage of: - Key source calculation: automatic derivation from FK joins, custom key sources - The populate process: execution flow, direct mode behavior, return values - The make() method: basic pattern, requirements, tripartite make (generator and method-based) - Transaction management: automatic transactions, atomicity, scope diagrams - Part tables: computed results with parts, transaction behavior, cascading deletes - Progress monitoring: progress() method, display_progress parameter - Direct vs distributed mode comparison Reorganized to present basic populate first, job reservation as an extension. Tripartite make pattern documented with both generator and method approaches. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Tutorials: - 01-getting-started: Blob detection pipeline example - 02-schema-design: Table tiers, keys, relationships, core types - 03-data-entry: Insert, update, delete operations - 04-queries: Restriction, projection, join, aggregation, fetch - 05-computation: Computed tables, make(), populate() Updates: - Home page: Relational Workflow Model explanation - Type system: Core types vs native types distinction - Schema design: Master-part relationships, compositional integrity - All tutorials use DataJoint 2.0 API (to_arrays, to_dicts, keys) - Dates updated to January 2026 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Explain OAS: unified architecture for relational + object storage - Clarify "object" terminology (data objects, not OOP) - Emphasize that object storage is managed with same rigor as database - List key OAS features: transparent access, lifecycle, deduplication - Update Quick Start dates to 2026 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Remove replace=True example, add caveat about breaking immutability - Introduce master-part with transactions for compositional integrity - Explain auto-populated tables enforce transactions automatically - Manual tables need explicit transactions for master-part inserts - All session+trial inserts now use transactions - Update best practices to emphasize transaction usage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Fixes: - 02-schema-design: Add task_params=None for consistent field sets - 03-data-entry: Fix to_arrays() usage for single column - 05-computation: Cast numpy bool to Python bool for is_fast All 5 tutorials now execute successfully with outputs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
API updates: - Replace safemode parameter with prompt in delete() - Remove download_path from fetch methods (use config.override instead) - Update fetch-api spec with config-based download path All tutorials re-executed and pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
The fetch module was removed in modern-fetch-api merge. Fetch methods are now on QueryExpression directly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
These terms are misnomers - they are restriction operations, not joins. Replaced with: - "Restriction by Query Expression" - "restriction" / "anti-restriction" Added reference to semantic matching spec for attribute matching. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Explain that semantic matching prevents accidental matches on unrelated attributes that happen to share names. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Replace keep_all_rows with exclude_nonmatching (inverted logic) - Default behavior now keeps all rows (LEFT JOIN) - Update query-algebra.md and primary-keys.md specs - Expand queries tutorial with: - Join primary key determination via functional dependencies - Entity-to-entity aggregation concept - Extension operator (.extend()) - Universal set (dj.U()) for ad-hoc groupings 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Explain default behavior keeps all entities (even without matches) - Show count(pk_attr) vs count(*) for correct zero counts - Add exclude_nonmatching=True example for filtering 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Clarify that prompt default is determined by config['safemode'] - Not hardcoded to True or "interactive mode" - Update best practices section accordingly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
User-friendly reference covering all query operators: - Restriction (&) and anti-restriction (-) - Projection (.proj()) - Join (*) - Extension (.extend()) - Aggregation (.aggr()) - Union (+) - Universal set (dj.U()) - Operator precedence - Semantic matching explanation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
entity-integrity.md: - Fix surrogate key definition: used inside database, not exposed to users - Replace auto_increment with UUID (no auto-increment in DataJoint) - Update all examples to use core DataJoint types (uint32, float32, etc.) - Use <blob> for blob storage type - Use datetime(3) for millisecond, datetime(6) for microsecond precision computation-model.md: - Add three-part make model for long-running computations - Explain make_fetch, make_compute, make_insert pattern - Document re-fetch verification for referential integrity - Explain when to use standard vs three-part make - Fix int to uint32 in Segmentation example 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
- installation.md — Install DataJoint and set up environment - configure-database.md — Database connection with secrets separation - define-tables.md — Table definitions with core DataJoint types - insert-data.md — Insert patterns including transactions - query-data.md — Query operators quick reference - fetch-results.md — Output methods and formats - run-computations.md — populate() and three-part make 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Remove unnecessary int() and bool() wrappers around boolean values now that datajoint-python properly handles np.bool_ types. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Update LICENSE from MIT to Apache 2.0 with copyright: Copyright 2014-2026 DataJoint Inc. and contributors 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Tutorials: - Add tutorial 06: Object Storage (externals, attachments, file stores) - Add advanced tutorials: custom codecs, distributed computing, migration - Fix distributed.ipynb multiprocessing demo (explain module requirement) - Minor updates to tutorials 01-03 for consistency How-to guides: - Add 14 new task-oriented guides covering common operations - Expand index with full guide listing Explanation: - Expand entity integrity section Config: - Update mkdocs.yaml navigation for new content - Add new images for documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
installation.md: - Change mysql-connector-python to pymysql - Update Python requirement to 3.10+ - Add DataJoint.com as recommended managed service define-tables.md: - Add Schema creation explanation - Separate core types from built-in codecs - Add json as core type (no angle brackets) - Document built-in codecs: blob, attach, object@store - Move indexes to end of definition examples - Clarify tables declared at @Schema decorator time - Add schema.drop() and table.drop() for prototyping - Use uint16 instead of int in examples configure-database.md: - Remove untested multiple connections section - Add DataJoint.com tip configure-storage.md: - Add DataJoint.com tip for pre-configured storage backup-restore.md: - Add DataJoint.com tip for automatic backups 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Updated mkdocs.yaml navigation: - Changed 'Migrate from 0.x' to 'Migrate to 2.0' - Points to new parallel schema migration guide (migrate-to-v20.md) The new guide provides a safer migration approach: - Zero production risk during testing - Unlimited practice runs in _v20 schemas - Easy rollback at every phase - Side-by-side validation Old in-place migration guide (migrate-from-0x.md) remains in repo but is no longer linked in navigation. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Major revision of migration guide to use standard git workflow: **Key Changes:** 1. Git branch approach: - Pin DataJoint 0.14.6 on main branch - Create migrate-to-v2 branch for DataJoint 2.0 - Use _v2 suffix for parallel schemas 2. Agentic code migration (~1 hour): - Detailed AI agent prompt for automated migration - Schema declarations, fetch API, type syntax - Defers external storage migration to Phase 2 3. Flexible data approach: - Option A: Fresh data for fast testing - Option B: Copy production data with pointer migration 4. Simpler cutover: - Merge branch when ready - Rename schemas or keep _v2 suffix - Standard git revert for rollback **Advantages over previous plan:** - Standard git workflow (familiar to developers) - AI-assisted migration saves hours - External storage deferred (optional) - Easy rollback (git checkout main) - No production risk during testing Timeline: Small pipeline ~2 days, medium ~1 week (vs ~3-6 weeks before) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
- Added detailed requirements section (Python 3.10+, MySQL 8.0+, license change) - Documented What's New in 2.0 (3-tier type system, codecs, unified stores) - Organized into 4 clear phases with detailed timelines - Phase I: Branch and code migration (~1-4 hours with AI assistance) - Pin legacy on main branch - Create pre/v2.0 migration branch - Configure DataJoint 2.0 and object storage - Convert table definitions with AI prompt - Convert query/insert code with AI prompt - Convert populate methods with AI prompt - Phase II: Test with sample data (~1-2 days) - Phase III: Migrate production data (~1-7 days, 3 options) - Option A: Copy and rename (recommended) - Option B: In-place migration - Option C: Gradual with legacy compatibility - Phase IV: Adopt new features (ongoing) - Emphasized key principles: - Production runs undisturbed through Phase II - Git branch workflow for safety - External storage deferred to Phase III - Agentic (AI-assisted) migration reduces time from weeks to hours - Added comprehensive examples, troubleshooting, and cross-references - Total timeline reduced from 3-6 weeks to 1-2 weeks
… III Critical corrections based on user feedback: Phase I changes: - Convert ALL codecs including external storage (blob@, attach@, filepath@) - Use TEST stores for development - External storage CODE implemented in Phase I - Only DATA migration deferred to Phase III Phase II changes: - Rename to "Test Compatibility and Equivalence" - Add Step 5: Compare with Legacy Schema - Emphasize side-by-side testing of legacy vs v2 - Validate that results are equivalent before touching production Phase III changes: - Emphasize this is DATA migration only (code complete from Phase I) - Add Step 0: Configure Production Stores - Clarify external storage metadata migration (UUID → JSON) - No file copying needed (keep in place) Key principle clarified throughout: - Phase I: ALL code changes (using test stores) - Phase II: Equivalence testing - Phase III: Production data migration only
User feedback: avoid 'external' since it's all integrated Terminology changes throughout migration guide: - 'external storage' → 'in-store codecs' or 'in-store' - 'External Storage Codecs' → 'In-Store Codecs' - Storage column label: 'External' → 'In-store' - Consistently use 'in-table' vs 'in-store' distinction Key concept clarification: - In-table: Data serialized into MySQL table (<blob>, <attach>) - In-store: Data stored in object stores (<blob@>, <npy@>, <filepath@>) Both are integrated into DataJoint - 'external' implied separation which doesn't reflect the unified architecture of DataJoint 2.0
Schema-addressed storage corrections: - <npy@> and <object@> are NEW in 2.0 (not migration targets) - Updated codec table to show 'New in 2.0' for schema-addressed types - Clarified these are adopted in Phase IV, not migrated in Phase I - Updated AI agent prompt to distinguish legacy vs new codecs - Removed suggestion to migrate to <npy@> in example - Added optional Phase IV adoption example Bullet list formatting fixes: - Add blank lines before all bullet lists for proper markdown rendering - Fixed 'Key principles', 'Timeline', 'End state', 'Prerequisites', 'Options', 'Advantages', 'What this does' sections - Ensures consistent rendering across markdown parsers
Critical corrections based on user feedback: 1. Codecs Section (What's New): - Split into 'Migration: Legacy → 2.0' and 'New in 2.0' sections - Clarified 0.14.x had IMPLICIT serialization (longblob auto-serialized) - 2.0 makes this EXPLICIT with <blob> codec - Added mediumblob → <blob> conversion (was missing) - Emphasized <npy@> and <object@> are NEW features, not migration targets 2. Phase I Step 4 (Configure Stores): - Added 'Skip this step if' for pipelines without legacy in-store - Only list LEGACY in-store formats (external-store, blob@store, etc.) - Removed <npy@> and <object@> from 'things to configure' - Added background explaining 0.14.x implicit vs 2.0 explicit codecs 3. AI Agent Prompt Updates: - Changed scope to 'Convert ALL legacy codecs' (not just 'all codecs') - Added explicit instruction: 'Do NOT add new 2.0 codecs' - Separated legacy in-store codecs from new 2.0 codecs - Added warning: 'IMPORTANT - Do NOT use these during migration' - Clarified these have NO legacy equivalent 4. Example Code: - Removed confusing RecordingEnhanced example with <npy@> - Added clear comment: 'Only convert existing legacy formats' - Noted Phase IV adoption is separate from migration Key insight: 0.14.x did NOT have an explicit codec system. Types like 'longblob' automatically serialized Python objects. 2.0 makes this explicit, but <npy@> and <object@> are entirely NEW capabilities.
User correction: There was NO 'external-store' type in legacy DataJoint. Legacy in-store types were: - blob@store (hash-addressed blobs) - attach@store (hash-addressed attachments) - filepath@store (filepath references) Changes: - Removed 'external-store' from codec migration table - Removed 'external-raw' from examples (used 'blob@raw' instead) - Updated Step 4 to list only actual legacy types - Fixed AI agent prompt to remove external-store conversion - Updated example code to show correct 0.14.x syntax (blob@raw, not external-raw) - Fixed Phase III migration helper calls
User feedback: No need to distinguish 0.13 vs 0.14 family versions. Changes: - Replaced all '0.14.x' references with 'pre-2.0' - Replaced all '0.14.6' references with 'pre-2.0' or 'legacy' - Updated pip install example to use 'datajoint<2.0.0' (valid version constraint) - Kept 'legacy' where it reads better contextually This simplifies the guide and avoids confusion about which specific pre-2.0 version the user might be on.
…ime, timestamp) Added detailed guidance for special core types to match migrate-from-0x.md: Background Section (User-facing): - Split type conversions into clear categories: - Integer and Float Types - String, Date, and Structured Types - Codecs - Added table for string/date/structured types with notes - Included json, uuid, enum, datetime, timestamp, tinyint(1) - Added important notes explaining: - Datetime/Timestamp: UTC-only in 2.0, convert timestamp → datetime - JSON: New core type, optional adoption - Enum: Already a core type, no changes needed AI Agent Prompt (Detailed Instructions): - Organized core types into logical groups - Added 'Core Types (String and Date)' section - Added 'Core Types (Structured Data)' for json and uuid - Added 'Special Cases' for tinyint(1) and timestamp - Included detailed 'IMPORTANT' sections for: - Datetime and Timestamp (UTC-only, conversion from timestamp) - Enum Types (no changes required) - JSON Type (optional adoption for JSON-in-blob migrations) - Provided examples for each special case This matches the level of detail in migrate-from-0x.md and ensures AI agents properly handle these types during migration.
- Add detailed bool vs uint8 guidance (tinyint(1) ambiguity) - Emphasize UTC-only datetime standard in DataJoint 2.0 - Clarify timezones handled by frontend, not database - Fix json/uuid status (both existed in pre-2.0) - Expand AI agent prompt with specific examples - Add conversion decision trees for ambiguous types
- Legacy supported bool and boolean types (MySQL stores as tinyint(1)) - Only explicit tinyint(1) declarations need review - Distinguish between bool (already present) vs tinyint(1) (ambiguous) - Update table to show bool/boolean as unchanged - Clarify in AI prompt: only tinyint(1) needs user decision
…one handling - Add utf8mb4/utf8mb4_bin as server-wide requirements in system requirements table - Explain character encoding is infrastructure configuration (like timezones) - Clarify timezones handled by 'application front-ends and client APIs', not just 'frontend' - Emphasize 'database stores UTC' throughout - Update all timezone references for consistency
…re types - Remove text and time from core types list - Add text and time as native types (discouraged) - text: recommend varchar(n) migration, or keep as native - time: no core equivalent, keep as native if needed - Add 'Core vs Native Types' explanation in Important Notes - Update AI agent prompt with native types guidance - Clarify json is a core type (was incorrectly called 'native') - Add warnings that native types will generate warnings in 2.0
…r time type Timestamp changes: - ASK USER about timezone convention (don't assume UTC) - Provide specific questions about timezone and MySQL auto-update behavior - Invite adoption of UTC throughout pipeline - Add example conversation showing interactive approach - Recommend adding data conversion script to Phase III if needed Time type changes: - Recommend migrating time → datetime (core type) - Ask user if date is also relevant before recommending datetime - Allow keeping time as native type if only time-of-day needed - Update AI agent prompt with interactive approach for both types This ensures users understand their timezone conventions and make deliberate decisions about conversion rather than automatic assumptions.
Fixed multiple instances where bullet lists immediately followed section headers without blank lines, which breaks markdown rendering. Affected sections: - Conversion rules (datetime/timestamp and bool) - 'Only explicit tinyint(1) declarations need review because:' - 'For text:' and 'For time:' native type guidance - CONTEXT, SCOPE, VERIFICATION, REPORT sections in AI prompts - CONVERSIONS NEEDED section
@ operator changes:
- OLD: table1 @ table2 → join(table2, semantic_check=False)
- NEW: table1 @ table2 → table1 * table2 (WITH semantic checks)
- IMPORTANT: @ bypassed semantic checks; * enables them by default
- If semantic checks fail, INVESTIGATE—may reveal schema/data errors
- Add guidance for .join(x, left=True) → .extend(x)
fetch API changes:
- Add: table.fetch1('KEY') → table.keys()
- Add: table.fetch('KEY', 'a', 'b') → table.to_arrays('a', 'b', include_key=True)
- Update all examples and patterns
- Update VERIFICATION and REPORT sections
- Fix validation script example to use keys()
Rationale: The @ operator was a special case that bypassed semantic
checks. DataJoint 2.0 enables semantic checks by default with *, which
helps users discover schema errors during migration.
fetch API additions:
- Add: fetch(..., format='frame') → to_pandas()
- Add pattern example for pandas DataFrame conversion
dj.U() pattern removal:
- OLD: dj.U('attr') * table → dj.U('attr') & table
- NEW: dj.U('attr') * table → table (no longer necessary)
- Updated all references: table, background, AI prompt, patterns, REPORT
- Pattern 8 renamed to 'Universal set (REMOVE)'
ERD deprecation:
- Add: dj.ERD(schema) → dj.Diagram(schema)
- ERD is deprecated in DataJoint 2.0
- Added to API comparison table and background section
Checklist updates:
- Add fetch(..., format='frame') check
- Add fetch1('KEY') check
- Add dj.U() * table removal check
- Add dj.ERD() conversion check
- Renumber pattern examples (was duplicate Pattern 5)
CORRECTION: Previous commit incorrectly stated dj.U() * table should be
removed entirely. This was wrong.
Correct understanding:
- dj.U('attr') & table → CORRECT pattern, remains unchanged
Used to project specific attributes (e.g., all unique dates)
Example: all_dates = dj.U('session_date') & Session
- dj.U('attr') * table → HACK pattern, needs refactoring
Was used to magically change primary key of table
Should be flagged and user asked to refactor
Changes:
- Add both patterns to API comparison table
- Split into separate 'Universal Set' section in background
- Update AI agent prompt to distinguish correct from hack
- Update PROCESS to 'identify as hack, ask user to refactor'
- Update VERIFICATION to check both patterns separately
- Update Pattern 10 to show both correct and hack examples
- Update REPORT to count both patterns separately
- Update commit message format
- Update Phase I checklist
This ensures users understand:
1. dj.U() & table is correct and should remain
2. dj.U() * table was a hack and needs attention
kavenk
approved these changes
Jan 14, 2026
MilagrosMarin
approved these changes
Jan 14, 2026
MilagrosMarin
approved these changes
Jan 14, 2026
MilagrosMarin
approved these changes
Jan 14, 2026
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR delivers the complete DataJoint 2.0 documentation, restructured following the Diátaxis framework for technical documentation. The documentation accompanies the DataJoint 2.0 release (datajoint-python).
Documentation Structure
Tutorials (Learning-oriented)
12 executable Jupyter notebooks organized by complexity:
Core Tutorials:
<object@>, large dataExamples:
Domain-Specific:
Advanced:
All notebooks are tested with
pytest --nbmake.How-To Guides (Task-oriented)
20+ practical guides for common tasks:
Concepts (Explanation-oriented)
Conceptual documentation organized by topic:
Foundations:
Schema Design:
Query System:
Data Management:
Reference (Information-oriented)
Specifications (15 documents):
API Documentation:
Additional Content
Elements:
Publications:
About:
Key Features
Diátaxis Framework
All content classified into exactly one category:
DataJoint 2.0 API
All examples use the new 2.0 API:
.fetch()method)int32,float64,varchar,uuid,json)<blob>,<blob@store>,<npy@store>,<object@store>)dj.Topfor single-row lookupsstaged_insert1for direct object storage writesexclude_nonmatchingparameterExecutable Examples
pytest --nbmakeMigration Guide
Comprehensive 7-phase migration from 0.x to 2.0:
Includes AI-assisted migration prompts and safety checks.
Visual Documentation
Technical Changes
Dependencies Simplified
Removed unused packages from
pip_requirements.txt:mike(version provider removed)mkdocs-redirects(not configured)mkdocs-pymdownx-material-extras(not used)nbconvert(transitive dependency)Navigation Reorganized
License Updated
Commits Summary
This PR includes 100+ commits covering: