Skip to content

Commit 6ac2766

Browse files
Merge pull request #2501 from redis/DOC-6015-string-dt-notebooks
DOC-6015 DOC-6016 DOC-6017 DOC-6018 DOC-6019 DOC-6020 DOC-6025 string data type notebooks
2 parents bdd0060 + d27722c commit 6ac2766

File tree

10 files changed

+1022
-11
lines changed

10 files changed

+1022
-11
lines changed

build/jupyterize/SPECIFICATION.md

Lines changed: 226 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,10 @@ Pitfalls to avoid:
3030
- Save preamble before the first step and any trailing preamble at end
3131
- Apply unwrap patterns in listed order; for Java, remove `@Test` before method wrappers
3232
- Dedent after unwrapping when any unwrap patterns exist for the language
33+
- **Boilerplate placement is not one-size-fits-all**: Go requires appending to first cell, not separate cell
34+
- Check kernel requirements before deciding boilerplate strategy
35+
- If kernel needs imports and boilerplate together, use Strategy 2 (append to first cell)
36+
- Otherwise, use Strategy 1 (separate boilerplate cell)
3337

3438
Add a new language (5 steps):
3539
1) Copy the C# pattern set as a starting point
@@ -1144,6 +1148,56 @@ Notes:
11441148
- Keep patterns intentionally narrow and anchored to reduce false positives.
11451149
- Java's `@Test` annotation pattern should come first to remove it before processing the method declaration
11461150

1151+
#### Go Configuration Example (Proposed)
1152+
1153+
Go has a unique requirement: notebooks MUST have a `func main() {}` wrapper for gophernotes to execute code. Unlike C# and Java where wrappers are removed entirely, Go requires the wrapper to be injected as boilerplate.
1154+
1155+
```json
1156+
{
1157+
"go": {
1158+
"boilerplate": [
1159+
"func main() {}"
1160+
],
1161+
"unwrap_patterns": [
1162+
{ "type": "package_declaration", "pattern": "^package\\s+\\w+\\s*$", "end_pattern": "^package\\s+\\w+\\s*$", "keep_content": false, "description": "Remove package declaration" },
1163+
{ "type": "func_main_opening", "pattern": "^func\\s+main\\(\\)\\s*\\{\\s*$", "end_pattern": "^\\}\\s*$", "keep_content": false, "description": "Remove func main() wrapper" },
1164+
{ "type": "closing_braces", "pattern": "^\\s*\\}\\s*$", "end_pattern": "^\\s*\\}\\s*$", "keep_content": false, "description": "Remove orphaned closing braces" }
1165+
]
1166+
}
1167+
}
1168+
```
1169+
1170+
**Key differences from C# and Java**:
1171+
- **Boilerplate is mandatory**: Go notebooks REQUIRE `func main() {}` wrapper (gophernotes requirement)
1172+
- **Package declaration removal**: Go source files have `package main` that must be removed
1173+
- **Wrapper removal**: The original `func main() { ... }` wrapper is removed, replaced by clean boilerplate
1174+
- **Pattern count**: Only 3 patterns needed (simpler than C# or Java)
1175+
- **No annotations or complex method signatures**: Go doesn't use annotations like Java or complex method signatures like C#
1176+
1177+
**Boilerplate considerations**:
1178+
- **Go**: Requires `func main() {}` wrapper in first cell (appended to imports)
1179+
- This is NOT standard Go practice (normally you'd have code at package level)
1180+
- But gophernotes requires all code to be inside a function
1181+
- **Special handling**: For Go, boilerplate is APPENDED to the first cell (imports), not injected as a separate cell
1182+
- This ensures imports and `func main() {}` are in the same cell, matching gophernotes expectations
1183+
- The wrapper is injected as boilerplate, then removed from source, then re-injected
1184+
- This ensures the notebook has a clean wrapper without test framework code
1185+
1186+
**Pattern complexity comparison** (As Proposed):
1187+
- **C#**: 5 patterns (class/method wrappers + closing braces)
1188+
- **Java**: 8 patterns (annotations + class/method wrappers + static main + closing braces)
1189+
- **Go**: 3 patterns (package declaration + func main wrapper + closing braces)
1190+
- Go is simpler because it doesn't have annotations or complex class hierarchies
1191+
1192+
**Critical insight for Go**: The wrapper removal strategy is different from C#/Java:
1193+
1. Remove `package main` declaration (test framework boilerplate)
1194+
2. Remove `func main() { ... }` wrapper (test framework wrapper)
1195+
3. Inject clean `func main() {}` as boilerplate (gophernotes requirement)
1196+
4. **Special handling**: Append boilerplate to first cell (imports), not as separate cell
1197+
5. Result: Notebook has imports and `func main() {}` in same first cell, with only example code inside
1198+
1199+
**Implementation note**: The `create_cells()` function detects Go language and appends boilerplate to the first non-empty cell instead of creating a separate boilerplate cell. This ensures the notebook structure matches gophernotes expectations.
1200+
11471201
### Runtime Order of Operations (within create_cells)
11481202

11491203
1) Load `lang_config = load_language_config(language)`
@@ -1160,6 +1214,83 @@ Notes:
11601214

11611215
This order ensures wrapper removal doesnt leave code over-indented and avoids generating spurious empty cells.
11621216

1217+
### Boilerplate Placement Strategies (Lessons Learned)
1218+
1219+
**Lesson Learned**: Not all languages should have boilerplate as a separate first cell. The placement strategy depends on kernel requirements and notebook structure expectations.
1220+
1221+
#### Strategy 1: Separate Boilerplate Cell (Default)
1222+
**Used by**: C#, Java, and most languages
1223+
1224+
**Characteristics**:
1225+
- Boilerplate is injected as a completely separate first cell
1226+
- Subsequent cells contain only example code
1227+
- Works well for languages where boilerplate (imports, setup) is naturally separate from example code
1228+
1229+
**Example structure**:
1230+
```
1231+
Cell 0: using StackExchange.Redis; (boilerplate)
1232+
Cell 1: var client = new Client(); (example code)
1233+
Cell 2: client.Set("key", "value"); (example code)
1234+
```
1235+
1236+
**Implementation**:
1237+
```python
1238+
if boilerplate and not append_boilerplate_to_first_cell:
1239+
boilerplate_cell = new_code_cell(source=boilerplate_code)
1240+
cells.append(boilerplate_cell)
1241+
```
1242+
1243+
#### Strategy 2: Append to First Cell (Go)
1244+
**Used by**: Go (gophernotes kernel)
1245+
1246+
**Characteristics**:
1247+
- Boilerplate is appended to the first non-empty cell (imports)
1248+
- Ensures imports and `func main() {}` are in the same cell
1249+
- Required by gophernotes kernel expectations
1250+
- Boilerplate appears AFTER imports in the same cell
1251+
1252+
**Example structure**:
1253+
```
1254+
Cell 0: import (
1255+
"fmt"
1256+
"github.com/redis/go-redis/v9"
1257+
)
1258+
1259+
func main() {} (boilerplate appended)
1260+
Cell 1: rdb := redis.NewClient(...) (example code)
1261+
Cell 2: rdb.Set(ctx, "key", "value") (example code)
1262+
```
1263+
1264+
**Implementation**:
1265+
```python
1266+
append_boilerplate_to_first_cell = language.lower() == 'go'
1267+
1268+
# Skip separate boilerplate cell for Go
1269+
if boilerplate and not append_boilerplate_to_first_cell:
1270+
# ... create separate cell
1271+
1272+
# Later, when processing first cell:
1273+
if append_boilerplate_to_first_cell and not first_cell_processed:
1274+
if boilerplate_code:
1275+
code = code + '\n\n' + boilerplate_code
1276+
first_cell_processed = True
1277+
```
1278+
1279+
**Why Go needs this**:
1280+
1. gophernotes kernel expects imports and `func main() {}` in the same cell
1281+
2. Separate cells would cause import errors when executing the boilerplate cell
1282+
3. Go's module system requires imports to be at the top of the file/function
1283+
4. The notebook structure must match Go's execution model
1284+
1285+
**Decision Point for Future Languages**:
1286+
When adding a new language, ask:
1287+
- Does the kernel require boilerplate and code in the same cell?
1288+
- Are there import/dependency issues if boilerplate is separate?
1289+
- Does the language's execution model require a specific cell structure?
1290+
1291+
If yes to any of these, use Strategy 2 (append to first cell). Otherwise, use Strategy 1 (separate cell).
1292+
1293+
11631294
### Testing Checklist (Language-Specific)
11641295

11651296
#### General Tests (All Languages)
@@ -1195,6 +1326,30 @@ This order ensures wrapper removal doesn’t leave code over-indented and avoids
11951326
- Test files with `main()` methods (if present in examples)
11961327
- Verify code inside wrappers is properly dedented
11971328

1329+
#### Go-Specific Tests
1330+
- Test with files from `local_examples/client-specific/go/`
1331+
- **Boilerplate placement** (Strategy 2: Append to First Cell):
1332+
- Verify first cell contains BOTH imports AND `func main() {}` (not separate cells)
1333+
- Verify imports appear BEFORE `func main() {}` in the same cell
1334+
- Verify no separate boilerplate cell exists
1335+
- Verify boilerplate is appended with blank line separator (`\n\n`)
1336+
- **Wrapper removal**:
1337+
- Verify `package main` declaration is removed
1338+
- Verify `func main() { ... }` wrapper is removed from source
1339+
- Verify no orphaned closing brace cells exist
1340+
- Verify all code inside wrapper is properly dedented
1341+
- **Code preservation**:
1342+
- Verify import statements are preserved in STEP blocks
1343+
- Verify all STEP blocks are preserved as separate cells
1344+
- Verify actual example code is intact and executable
1345+
- **Notebook structure**:
1346+
- Verify kernel is set to `gophernotes`
1347+
- Verify cell count matches expected (imports + N steps)
1348+
- Verify step metadata is preserved on each cell
1349+
- **Real-world testing**:
1350+
- Test with `landing_examples.go` (real repository file)
1351+
- Verify generated notebook is executable in gophernotes kernel
1352+
11981353
### Edge Cases and Gotchas
11991354

12001355
#### General Unwrapping Gotchas
@@ -1221,6 +1376,56 @@ This order ensures wrapper removal doesn’t leave code over-indented and avoids
12211376
- **Empty lines after class declaration**: Java style often has empty line after class opening brace
12221377
- The unwrapping logic should handle this naturally by removing the class line and dedenting
12231378

1379+
#### Go-Specific Gotchas
1380+
1381+
**Critical Requirement**: Go Jupyter notebooks using the gophernotes kernel MUST have all code wrapped in a `func main() {}` block. This is NOT standard Go practice, but it IS required for gophernotes to execute code.
1382+
1383+
- **`func main() {}` wrapper is mandatory**: Unlike C# and Java where wrappers are removed entirely, Go requires the wrapper to be preserved in the notebook
1384+
- Solution: Inject `func main() {}` as boilerplate in the first cell
1385+
- Remove the original `func main() { ... }` wrapper from the source file using unwrap patterns
1386+
- This ensures the notebook has a clean `func main() {}` wrapper without the test framework code inside
1387+
1388+
- **Package declaration must be removed**: Go source files start with `package main` (or other package names)
1389+
- This is test framework boilerplate and should be removed
1390+
- Pattern: `^package\\s+\\w+\\s*$` (single-line pattern)
1391+
- The boilerplate `func main() {}` replaces the need for package context
1392+
1393+
- **Wrapper structure in source files**: Go test files have:
1394+
- Line 1-2: `// EXAMPLE:` and `// BINDER_ID` markers
1395+
- Line 3: `package main` declaration
1396+
- Lines 4+: Import statements (typically in STEP blocks)
1397+
- Lines N: `func main() { ... }` wrapper containing all example code
1398+
- Last line: Closing `}` for the main function
1399+
1400+
- **Unwrap pattern order matters**: Must remove `package` declaration BEFORE removing `func main()` wrapper
1401+
- If `func main()` is removed first, the package line becomes orphaned
1402+
- Recommended order: `package_declaration` → `func_main_opening`
1403+
1404+
- **Closing brace handling**: After removing `func main() { ... }` wrapper, the closing `}` will be removed by the generic closing brace pattern
1405+
- Ensure the closing brace pattern is included in unwrap_patterns
1406+
- The boilerplate `func main() {}` provides the wrapper, so orphaned closing braces should be filtered out
1407+
1408+
- **Indentation inside func main()**: Code inside `func main() { ... }` is typically indented with tabs
1409+
- After unwrapping, dedent all cells to remove the extra indentation
1410+
- Go uses tabs by convention, so dedent should handle both tabs and spaces
1411+
1412+
- **Boilerplate placement is special for Go** (Strategy 2: Append to First Cell):
1413+
- Unlike C# and Java where boilerplate is a separate first cell, Go appends boilerplate to the first cell
1414+
- This is because gophernotes expects imports and `func main() {}` in the same cell
1415+
- If boilerplate were a separate cell, executing it would fail (imports not available)
1416+
- Implementation: Detect Go language and append boilerplate to first non-empty cell instead of creating separate cell
1417+
- Result: First cell contains imports + blank line + `func main() {}`
1418+
- This is a critical difference from other languages and must be tested explicitly
1419+
1420+
- **Real-world example**: `local_examples/client-specific/go/landing_examples.go`
1421+
- Source file has `func main() { ... }` wrapper (lines 15-105)
1422+
- Contains multiple STEP blocks inside the wrapper
1423+
- After conversion, should produce notebook with:
1424+
- Cell 1 (boilerplate): `func main() {}`
1425+
- Cell 2 (import step): Import statements
1426+
- Cell 3+ (other steps): Example code, properly dedented
1427+
- No orphaned closing brace cells
1428+
12241429

12251430
### Recommended Implementation Strategy
12261431

@@ -1251,10 +1456,28 @@ This order ensures wrapper removal doesn’t leave code over-indented and avoids
12511456

12521457
**Critical lesson from Java implementation**: The specification initially estimated 6 patterns would be needed, but examining real files revealed that 8 patterns were required (added `static_main_single_line` and `static_main_opening` after seeing `main()` methods in real files). **Always examine real files BEFORE writing patterns.**
12531458

1254-
**Phase 4: Other Languages** (Lower Priority)
1459+
**Phase 4: Structural Unwrapping for Go** (High Priority)
1460+
1. **FIRST**: Examine real Go files to understand wrapper patterns:
1461+
- Look at `local_examples/client-specific/go/landing_examples.go`
1462+
- Note: Go files have `package main` declaration and `func main() { ... }` wrapper
1463+
- Unlike C#/Java, Go REQUIRES a `func main() {}` wrapper in notebooks (gophernotes requirement)
1464+
2. Add Go boilerplate: `func main() {}` as first cell
1465+
3. Add Go unwrap_patterns:
1466+
- `package_declaration`: Remove `package main` line
1467+
- `func_main_opening`: Remove `func main() { ... }` wrapper (multi-line pattern)
1468+
- Ensure closing braces are handled by existing pattern
1469+
4. Test with `local_examples/client-specific/go/landing_examples.go`
1470+
5. Verify:
1471+
- First cell contains `func main() {}`
1472+
- No `package main` declaration in notebook
1473+
- No orphaned closing brace cells
1474+
- All code properly dedented
1475+
- All STEP blocks preserved
1476+
1477+
**Phase 5: Other Languages** (Lower Priority)
12551478
1. Add Node.js configuration (if needed)
1256-
2. Add other languages (Go, PHP, Rust) as needed
1257-
3. Most of these languages don't have the same structural wrapper issues as C#/Java
1479+
2. Add other languages (PHP, Rust) as needed
1480+
3. Most of these languages don't have the same structural wrapper issues as C#/Java/Go
12581481

12591482
### Configuration File Location
12601483

build/jupyterize/jupyterize.py

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -525,17 +525,24 @@ def create_cells(parsed_blocks, language):
525525
# Get language configuration
526526
lang_config = load_language_config(language)
527527

528-
# Add boilerplate cell if defined
528+
# Get boilerplate if defined
529529
boilerplate = lang_config.get('boilerplate', [])
530-
if boilerplate:
531-
boilerplate_code = '\n'.join(boilerplate)
530+
boilerplate_code = '\n'.join(boilerplate) if boilerplate else None
531+
532+
# For Go, append boilerplate to first cell instead of creating separate cell
533+
# This ensures imports and func main() {} are in the same cell
534+
append_boilerplate_to_first_cell = language.lower() == 'go'
535+
536+
# Add boilerplate cell if defined (except for Go, which appends to first cell)
537+
if boilerplate and not append_boilerplate_to_first_cell:
532538
boilerplate_cell = new_code_cell(source=boilerplate_code)
533539
boilerplate_cell.metadata['cell_type'] = 'boilerplate'
534540
boilerplate_cell.metadata['language'] = language
535541
cells.append(boilerplate_cell)
536542
logging.info(f"Added boilerplate cell for {language} ({len(boilerplate)} lines)")
537543

538544
# Process regular cells
545+
first_cell_processed = False
539546
for i, block in enumerate(parsed_blocks):
540547
code = block['code']
541548

@@ -568,11 +575,19 @@ def create_cells(parsed_blocks, language):
568575
logging.debug(f"Skipping cell {i} (contains only closing braces)")
569576
continue
570577

578+
# For Go: append boilerplate to first cell (imports)
579+
if append_boilerplate_to_first_cell and not first_cell_processed:
580+
if boilerplate_code:
581+
code = code + '\n\n' + boilerplate_code
582+
logging.info(f"Appended boilerplate to first cell for {language}")
583+
first_cell_processed = True
584+
571585
# Create code cell
572586
cell = new_code_cell(source=code)
573587

574-
# Add step metadata if present
575-
if block['step_name']:
588+
# Add step metadata if present and enabled for this language
589+
add_step_metadata = lang_config.get('add_step_metadata', True) # Default to True for backward compatibility
590+
if block['step_name'] and add_step_metadata:
576591
cell.metadata['step'] = block['step_name']
577592
logging.debug(f"Created cell {i} with step '{block['step_name']}'")
578593
else:

build/jupyterize/jupyterize_config.json

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,32 @@
4444
"unwrap_patterns": []
4545
},
4646
"go": {
47-
"boilerplate": [],
48-
"unwrap_patterns": []
47+
"boilerplate": [
48+
"func main() {}"
49+
],
50+
"unwrap_patterns": [
51+
{
52+
"type": "package_declaration",
53+
"pattern": "^package\\s+\\w+\\s*$",
54+
"end_pattern": "^package\\s+\\w+\\s*$",
55+
"keep_content": false,
56+
"description": "Remove package declaration"
57+
},
58+
{
59+
"type": "func_main_opening",
60+
"pattern": "^func\\s+main\\(\\)\\s*\\{\\s*$",
61+
"end_pattern": "^\\}\\s*$",
62+
"keep_content": false,
63+
"description": "Remove func main() wrapper"
64+
},
65+
{
66+
"type": "closing_braces",
67+
"pattern": "^\\s*\\}\\s*$",
68+
"end_pattern": "^\\s*\\}\\s*$",
69+
"keep_content": false,
70+
"description": "Remove orphaned closing braces"
71+
}
72+
]
4973
},
5074
"java": {
5175
"boilerplate": [],
@@ -103,7 +127,8 @@
103127
},
104128
"php": {
105129
"boilerplate": [],
106-
"unwrap_patterns": []
130+
"unwrap_patterns": [],
131+
"add_step_metadata": false
107132
},
108133
"rust": {
109134
"boilerplate": [],

0 commit comments

Comments
 (0)