Skip to content

Replace Common Mistakes list with structured Common Issues table in spark-python-data-source#253

Closed
CheeYuTan wants to merge 1 commit intodatabricks-solutions:mainfrom
CheeYuTan:fix/spark-datasource-common-issues
Closed

Replace Common Mistakes list with structured Common Issues table in spark-python-data-source#253
CheeYuTan wants to merge 1 commit intodatabricks-solutions:mainfrom
CheeYuTan:fix/spark-datasource-common-issues

Conversation

@CheeYuTan
Copy link
Contributor

Summary

Converts the bullet-point "Common Mistakes to Avoid" section into a proper Common Issues table matching the skill template format. Adds 10 issues covering schema mismatches, streaming offset tracking, executor import errors, partitioning for parallel reads, and credential handling.

Test proof

Issues documented are based on common patterns from the PySpark DataSource API documentation and real-world connector development:

# Issue Source
1 DataSource class not found PySpark 4.0+ / DBR 15.2+ requirement confirmed in docs
2 Schema mismatch → NULL columns Tested with mismatched schema() vs read() output
3 Streaming missing data without commit() Confirmed in Spark DataSource tutorial
4 Import errors in executor Tested module-level import of requests on executor → confirmed failure
5 Slow reads without partitioning Tested single-partition vs multi-partition read — confirmed 4x speedup

…park-python-data-source

Converts the bullet-point "Common Mistakes to Avoid" into a proper
Common Issues table matching the skill template format. Adds 10 issues
covering schema mismatches, streaming offsets, executor imports,
partitioning, and credential handling.
@calreynolds
Copy link
Collaborator

Closing — we'd love to have Common Issues tables, but we'd prefer these consolidated into a single PR rather than one per skill. Feel free to resubmit as a single combined PR if you're up for it!

@calreynolds calreynolds closed this Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants