Skip to content

feat: jupyter notebook support#1513

Merged
milenkovicm merged 10 commits intoapache:mainfrom
sandugood:feature/jupyter-notebook-support
Apr 9, 2026
Merged

feat: jupyter notebook support#1513
milenkovicm merged 10 commits intoapache:mainfrom
sandugood:feature/jupyter-notebook-support

Conversation

@sandugood
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Refactoring of PR #1430

Rationale for this change

Latest commit to the mentioned PR was in January. Refactoring felt like a nice thing.

What changes are included in this PR?

  1. Fix of both @line_magic and @cell_magic decorators. Combined them under the @line_cell_magic decorator
  2. Nicely HTML-formatted output of cells. Using display() func and also printing in a loop where applicable

Are there any user-facing changes?

No.

littleKitchen and others added 4 commits January 31, 2026 00:45
…s and examples

This PR implements all items from the checklist in issue apache#1398:

## Implementation Checklist

- [x] Add example .ipynb notebooks to python/examples/
  - getting_started.ipynb - Basic connection and queries
  - dataframe_api.ipynb - DataFrame transformations
  - distributed_queries.ipynb - Multi-stage query examples

- [x] Document notebook support in Python README
  - Added comprehensive Jupyter section with examples

- [x] Create ballista.jupyter module with magic commands
  - Full implementation with BallistaMagics class

- [x] Add %ballista connect/status/tables/schema line magics
  - connect: Connect to Ballista cluster
  - status: Show connection status
  - tables: List registered tables
  - schema: Show table schema
  - disconnect: Disconnect from cluster
  - history: Show query history

- [x] Add %%sql cell magic
  - Line magic for single-line queries
  - Cell magic for multi-line queries
  - Variable assignment support
  - --no-display and --limit options

- [x] Add explain_visual() method for query plan rendering
  - Generates DOT/SVG visualization
  - Supports Jupyter _repr_html_
  - Fallback when graphviz not installed

- [x] Add progress indicator support for long-running queries
  - collect_with_progress() method
  - Callback support for custom progress handling
  - Jupyter-aware display

- [x] Consider JupySQL integration
  - Documented as alternative in README

## Additional Features

- ExecutionPlanVisualization class for plan rendering
- tables() method on BallistaSessionContext
- Optional jupyter dependency in pyproject.toml
- Comprehensive test coverage (45 tests passing)

Closes apache#1398
Comment thread python/python/ballista/jupyter.py Outdated
if not tables:
return "No tables registered.\n\nUse ctx.register_parquet() or ctx.register_csv() to register tables."
schema_count = len(tables.keys())
table_count = len(tables.values())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
table_count = len(tables.values())
table_count = sum(len(v) for v in tables.values())

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.

for row in lines:
print(row)

def _show_help(self) -> Optional[str]:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function supposed to return something ?
Currently it just prints and implicitly returns None but https://github.com/apache/datafusion-ballista/pull/1513/changes#diff-db55df3b39944095b30c0a940790394d25c009baf49be871acd19d9eab2f30b8R90-R93 seems to expect Some

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of return of str in functions changed the return type to Optional[str]. Remade it to display() if the ipython is available and defaults to print otherwise.
Also changed the testcases to account for that.

if IPYTHON_AVAILABLE:
display(HTML(html))
else:
print("\n".join(status_lines))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous comment


try:
# Query the table with LIMIT 0 to get schema without data
df = self.ctx.sql(f"SELECT * FROM {table_name} LIMIT 0")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should table_name be sanitized ? Currently it could lead to SQL injection

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are considering a local development, I guess?
It is the same as:
%sql some SQL code

so it might be potentially any SQL code, executed by the data-scientist/analyst.

else:
def decorator(func):
return func
return decorator
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No fallback for line_cell_magic. @line_cell_magic will fail if IPython is not available

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment thread python/examples/getting_started.ipynb Outdated
"# ctx = BallistaSessionContext(\"df://your-scheduler:50050\")\n",
"\n",
"# host, port = setup_test_cluster()\n",
"host, port = \"localhost\", \"39431\"\n",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a debug leftover ? IMO line 79 should be used

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, debug leftover. Fixed

Comment thread python/python/ballista/extension.py Outdated

def collect_with_progress(
self,
callback: Optional[callable] = None,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
callback: Optional[callable] = None,
callback: Optional[Callable] = None,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thank you

Comment thread python/python/ballista/jupyter.py Outdated
LIMIT 10
"""
for row in help_info.split("\n"):
print(row)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(row)
print(row)
return help_info

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment thread python/README.md
## Jupyter Notebook Support

PyBallista provides first-class Jupyter notebook support with SQL magic commands and rich HTML rendering.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Install Jupyter extras first:
```bash
pip install "ballista[jupyter]"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment thread python/python/ballista/extension.py Outdated
try:
catalog = self.catalog()
schema_names = list(catalog.schema_names())
if schema_names is not None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A list is never None.

Suggested change
if schema_names is not None:
if schema_names:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thank you

@sandugood
Copy link
Copy Markdown
Contributor Author

My main question was regarding the SQL-sanitizing. It seems to me that it is not needed (adding another layer of logic) in the current implementation. But I'm opened to suggestions and thank you for your review.

@milenkovicm
Copy link
Copy Markdown
Contributor

My main question was regarding the SQL-sanitizing. It seems to me that it is not needed (adding another layer of logic) in the current implementation. But I'm opened to suggestions and thank you for your review.

hey @sandugood thanks for the contribution, i would agree with you, I don't think this is security issue as user can do whatever it wants anyway.

will have a look at the pr later in more details

@milenkovicm
Copy link
Copy Markdown
Contributor

hey @sandugood there are few tests failing here, if you wish to progress with this PR please have a look

@sandugood sandugood marked this pull request as draft March 29, 2026 10:28
@sandugood
Copy link
Copy Markdown
Contributor Author

Converting to draft, going to fix it

@milenkovicm
Copy link
Copy Markdown
Contributor

thanks

@sandugood sandugood marked this pull request as ready for review March 30, 2026 18:16
@sandugood
Copy link
Copy Markdown
Contributor Author

Fixed the pytest functions and ran through the ruff

@milenkovicm
Copy link
Copy Markdown
Contributor

thanks @sandugood will have a look asap

Copy link
Copy Markdown
Contributor

@milenkovicm milenkovicm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks @sandugood

@sandugood
Copy link
Copy Markdown
Contributor Author

thank you for your review @milenkovicm

@sandugood sandugood changed the title Feature/jupyter notebook support feat: jupyter notebook support Apr 8, 2026
@sandugood
Copy link
Copy Markdown
Contributor Author

I think it's ready for merge. any additional comments? @martin-g

@milenkovicm milenkovicm merged commit 6091b52 into apache:main Apr 9, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants