Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
088ab03
docs: regenerate LLM files with corrected migration guide links
dimitri-yatsenko Jan 14, 2026
a6fa7f9
term: replace deprecated 'inline' with canonical 'in-table' storage t…
dimitri-yatsenko Jan 14, 2026
782eafb
docs: clarify DataJoint 2.0 pre-release status in installation guide
dimitri-yatsenko Jan 14, 2026
3b89db7
Merge remote-tracking branch 'origin/main'
dimitri-yatsenko Jan 14, 2026
3f509fe
docs: add comprehensive secrets and credentials management guide
dimitri-yatsenko Jan 14, 2026
6e50ad3
docs: add learning paths to tutorials for better navigation
dimitri-yatsenko Jan 14, 2026
83b1153
docs: add comprehensive storage codec decision guide
dimitri-yatsenko Jan 14, 2026
38ca65c
docs: enhance specs index with reading order and cross-references
dimitri-yatsenko Jan 14, 2026
f1f4c63
docs: add comprehensive object storage documentation index
dimitri-yatsenko Jan 14, 2026
6b69fe8
docs: add Jobs 2.0 decision guidance for populate modes
dimitri-yatsenko Jan 14, 2026
0dcdd93
docs: clarify hash-addressed vs schema-addressed storage distinction
dimitri-yatsenko Jan 14, 2026
de0807b
Merge remote-tracking branch 'origin/fix/terminology-inline-to-in-tab…
dimitri-yatsenko Jan 14, 2026
3ccaefb
Merge remote-tracking branch 'origin/fix/installation-version-clarity…
dimitri-yatsenko Jan 14, 2026
f16227e
Merge remote-tracking branch 'origin/docs/comprehensive-secrets-guide…
dimitri-yatsenko Jan 14, 2026
8c3aaea
Merge remote-tracking branch 'origin/docs/add-learning-paths' into pr…
dimitri-yatsenko Jan 14, 2026
9141429
Merge remote-tracking branch 'origin/docs/storage-codec-decision-guid…
dimitri-yatsenko Jan 14, 2026
b5f49e0
Merge remote-tracking branch 'origin/docs/enhance-specs-index' into p…
dimitri-yatsenko Jan 14, 2026
0d19753
Merge: resolve conflicts by including both storage guides
dimitri-yatsenko Jan 14, 2026
8b410f8
Merge remote-tracking branch 'origin/docs/jobs-decision-guidance' int…
dimitri-yatsenko Jan 14, 2026
4719bff
Merge: resolve conflicts by including storage distinction clarification
dimitri-yatsenko Jan 14, 2026
4b4b5d4
docs: correct storage size limits and add technical details
dimitri-yatsenko Jan 14, 2026
9f0e298
docs: emphasize Python object convenience for blob/blob@/npy@
dimitri-yatsenko Jan 14, 2026
c89f0d2
docs: add missing <attach> in-table codec throughout documentation
dimitri-yatsenko Jan 14, 2026
5142151
docs: fix concatenated lists in tutorials index
dimitri-yatsenko Jan 14, 2026
1d64bb7
docs: prominently feature DataJoint Elements as production software
dimitri-yatsenko Jan 14, 2026
0545957
docs: clarify full normalization with intrinsic attributes principle
dimitri-yatsenko Jan 14, 2026
fae289a
docs: simplify dimensions - remove mixed tables complexity
dimitri-yatsenko Jan 14, 2026
70f3172
docs: fix attribute lineage - foreign key attributes trace back
dimitri-yatsenko Jan 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 32 additions & 37 deletions src/explanation/entity-integrity.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,15 +172,13 @@ referential integrity and workflow dependency.

## Schema Dimensions

A **dimension** is an independent axis of variation in your data, introduced by
a table that defines new primary key attributes. Dimensions are the fundamental
building blocks of schema design.
A **dimension** is an independent axis of variation in your data. The fundamental principle:

### Dimension-Introducing Tables
> **Any table that introduces a new primary key attribute introduces a new dimension.**

A table **introduces a dimension** when it defines primary key attributes that
don't come from a foreign key. In schema diagrams, these tables have
**underlined names**.
This is true whether the table has only new attributes or also inherits attributes from foreign keys. The key is simply: new primary key attribute = new dimension.

### Tables That Introduce Dimensions

```python
@schema
Expand All @@ -192,52 +190,49 @@ class Subject(dj.Manual):
"""

@schema
class Modality(dj.Lookup):
class Session(dj.Manual):
definition = """
modality : varchar(32) # NEW dimension: modality
-> Subject # Inherits subject_id
session_idx : uint16 # NEW dimension: session_idx
---
description : varchar(255)
session_date : date
"""
```

Both `Subject` and `Modality` are dimension-introducing tablesβ€”they create new
axes along which data varies.

### Dimension-Inheriting Tables

A table **inherits dimensions** when its entire primary key comes from foreign
keys. In schema diagrams, these tables have **non-underlined names**.

```python
@schema
class SubjectProfile(dj.Manual):
class Trial(dj.Manual):
definition = """
-> Subject # Inherits subject_id dimension
-> Session # Inherits subject_id, session_idx
trial_idx : uint16 # NEW dimension: trial_idx
---
weight : float32
outcome : enum('success', 'fail')
"""
```

`SubjectProfile` doesn't introduce a new dimensionβ€”it extends the `Subject`
dimension with additional attributes. There's exactly one profile per subject.
**All three tables introduce dimensions:**

- `Subject` introduces `subject_id` dimension
- `Session` introduces `session_idx` dimension (even though it also inherits `subject_id`)
- `Trial` introduces `trial_idx` dimension (even though it also inherits `subject_id`, `session_idx`)

### Mixed Tables
In schema diagrams, tables that introduce at least one new dimension have **underlined names**.

Most tables both inherit and introduce dimensions:
### Tables That Don't Introduce Dimensions

A table introduces **no dimensions** when its entire primary key comes from foreign keys:

```python
@schema
class Session(dj.Manual):
class SubjectProfile(dj.Manual):
definition = """
-> Subject # Inherits subject_id dimension
session_idx : uint16 # NEW dimension within subject
-> Subject # Inherits subject_id only
---
session_date : date
weight : float32
"""
```

`Session` inherits the subject dimension but introduces a new dimension
(`session_idx`) within each subject. This creates a hierarchical structure.
`SubjectProfile` doesn't introduce any new primary key attributeβ€”it extends the `Subject` dimension with additional attributes. There's exactly one profile per subject.

In schema diagrams, these tables have **non-underlined names**.

### Computed Tables and Dimensions

Expand Down Expand Up @@ -288,15 +283,15 @@ detection.

### Dimensions and Attribute Lineage

Every primary key attribute traces back to the dimension where it was first
Every foreign key attribute traces back to the dimension where it was first
defined. This is called **attribute lineage**:

```
Subject.subject_id β†’ myschema.subject.subject_id (origin)
Session.subject_id β†’ myschema.subject.subject_id (inherited)
Session.subject_id β†’ myschema.subject.subject_id (inherited via foreign key)
Session.session_idx β†’ myschema.session.session_idx (origin)
Trial.subject_id β†’ myschema.subject.subject_id (inherited)
Trial.session_idx β†’ myschema.session.session_idx (inherited)
Trial.subject_id β†’ myschema.subject.subject_id (inherited via foreign key)
Trial.session_idx β†’ myschema.session.session_idx (inherited via foreign key)
Trial.trial_idx β†’ myschema.trial.trial_idx (origin)
```

Expand Down
132 changes: 116 additions & 16 deletions src/explanation/normalization.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,16 @@ makes normalization intuitive.

This principle naturally leads to well-normalized schemas.

## The Intrinsic Attributes Principle

> **"Each entity should contain only its intrinsic attributesβ€”properties that are inherent to the entity itself. Relationships, assignments, and events that happen over time belong in separate tables."**

**Full workflow entity normalization** is achieved when:

1. Each row represents a single, well-defined entity
2. Each entity is entered once when first tracked
3. Events that happen at later stages belong in separate tables

## Why Normalization Matters

Without normalization, databases suffer from:
Expand All @@ -29,6 +39,8 @@ table structure. DataJoint takes a different approach: design tables around

### Example: Mouse Housing

**Problem: Cage is not intrinsic to a mouse.** A mouse's cage can change over time. The cage assignment is an **event** that happens after the mouse is first tracked.

**Denormalized (problematic):**

```python
Expand All @@ -44,7 +56,7 @@ class Mouse(dj.Manual):
"""
```

**Normalized (correct):**
**Partially normalized (better, but not complete):**

```python
@schema
Expand All @@ -53,15 +65,47 @@ class Cage(dj.Manual):
cage_id : int32
---
location : varchar(50)
temperature : float32
"""

@schema
class Mouse(dj.Manual):
definition = """
mouse_id : int32
---
-> Cage # Still treats cage as static attribute
"""
```

**Fully normalized (correct):**

```python
@schema
class Cage(dj.Manual):
definition = """
cage_id : int32
---
location : varchar(50)
"""

@schema
class Mouse(dj.Manual):
definition = """
mouse_id : int32
---
date_of_birth : date
sex : enum('M', 'F')
# Note: NO cage reference here!
# Cage is not intrinsic to the mouse
"""

@schema
class CageAssignment(dj.Manual):
definition = """
-> Mouse
assignment_date : date
---
-> Cage
removal_date=null : date
"""

@schema
Expand All @@ -74,20 +118,37 @@ class MouseWeight(dj.Manual):
"""
```

This normalized design:
This fully normalized design:

- Stores cage info once (no redundancy)
- Tracks weight history (temporal dimension)
- Allows cage changes without data loss
- **Intrinsic attributes only** β€” `Mouse` contains only attributes determined at creation (birth date, sex)
- **Cage assignment as event** β€” `CageAssignment` tracks the temporal relationship between mice and cages
- **Single entity per row** β€” Each mouse is entered once when first tracked
- **Later events separate** β€” Cage assignments, weight measurements happen after initial tracking
- **History preserved** β€” Can track cage moves over time without data loss

## The Workflow Test

Ask: "At which workflow step is this attribute determined?"
Ask these questions to determine table structure:

- If an attribute is determined at a **different step**, it belongs in a
**different table**
- If an attribute **changes over time**, it needs its own table with a
**temporal key**
### 1. "Is this an intrinsic attribute of the entity?"

An intrinsic attribute is inherent to the entity itself and determined when the entity is first created.

- **Intrinsic:** Mouse's date of birth, sex, genetic strain
- **Not intrinsic:** Mouse's cage (assignment that changes), weight (temporal measurement)

If not intrinsic β†’ separate table for the relationship or event

### 2. "At which workflow step is this attribute determined?"

- If an attribute is determined at a **different step**, it belongs in a **different table**
- If an attribute **changes over time**, it needs its own table with a **temporal key**

### 3. "Is this a relationship or event?"

- **Relationships** (cage assignment, group membership) β†’ association table with temporal keys
- **Events** (measurements, observations) β†’ separate table with event date/time
- **States** (approval status, processing stage) β†’ state transition table

## Common Patterns

Expand Down Expand Up @@ -126,7 +187,7 @@ class AnalysisParams(dj.Lookup):

### Temporal Tracking

Track attributes that change over time:
Track measurements or observations over time:

```python
@schema
Expand All @@ -139,6 +200,34 @@ class SubjectWeight(dj.Manual):
"""
```

### Temporal Associations

Track relationships or assignments that change over time:

```python
@schema
class GroupAssignment(dj.Manual):
definition = """
-> Subject
assignment_date : date
---
-> ExperimentalGroup
removal_date=null : date
"""

@schema
class HousingAssignment(dj.Manual):
definition = """
-> Animal
move_date : date
---
-> Cage
move_reason : varchar(200)
"""
```

**Key pattern:** The relationship itself (subject-to-group, animal-to-cage) is **not intrinsic** to either entity. It's a temporal event that happens during the workflow.

## Benefits in DataJoint

1. **Natural from workflow thinking** β€” Designing around workflow steps
Expand All @@ -155,7 +244,18 @@ class SubjectWeight(dj.Manual):

## Summary

- Normalize by designing around **workflow steps**
- Each table = one entity type at one workflow step
- Attributes belong with the step that **determines** them
- Temporal data needs **temporal keys**
**Core principles:**

1. **Intrinsic attributes only** β€” Each entity contains only properties inherent to itself
2. **One entity, one entry** β€” Each entity entered once when first tracked
3. **Events separate** β€” Relationships, assignments, measurements that happen later belong in separate tables
4. **Workflow steps** β€” Design tables around the workflow step that creates each entity
5. **Temporal keys** β€” Relationships and observations that change over time need temporal keys (dates, timestamps)

**Ask yourself:**

- Is this attribute intrinsic to the entity? (No β†’ separate table)
- Does this attribute change over time? (Yes β†’ temporal table)
- Is this a relationship or event? (Yes β†’ association/event table)

Following these principles achieves **full workflow entity normalization** where each table represents a single, well-defined entity type entered at a specific workflow step.
2 changes: 1 addition & 1 deletion src/explanation/whats-new-2.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ zarr_array : <object@store> # Path-addressed for Zarr/HDF5
### What Changed

Legacy DataJoint overloaded MySQL types with implicit conversions:
- `longblob` could be blob serialization OR inline attachment
- `longblob` could be blob serialization OR in-table attachment
- `attach` was implicitly converted to longblob
- `uuid` was used internally for external storage

Expand Down
Loading