Replace Common Mistakes list with structured Common Issues table in spark-python-data-source#253
Closed
CheeYuTan wants to merge 1 commit intodatabricks-solutions:mainfrom
Closed
Conversation
…park-python-data-source Converts the bullet-point "Common Mistakes to Avoid" into a proper Common Issues table matching the skill template format. Adds 10 issues covering schema mismatches, streaming offsets, executor imports, partitioning, and credential handling.
Collaborator
|
Closing — we'd love to have Common Issues tables, but we'd prefer these consolidated into a single PR rather than one per skill. Feel free to resubmit as a single combined PR if you're up for it! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Converts the bullet-point "Common Mistakes to Avoid" section into a proper Common Issues table matching the skill template format. Adds 10 issues covering schema mismatches, streaming offset tracking, executor import errors, partitioning for parallel reads, and credential handling.
Test proof
Issues documented are based on common patterns from the PySpark DataSource API documentation and real-world connector development:
DataSourceclass not foundschema()vsread()outputcommit()requestson executor → confirmed failure