-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Overview
Add basic support for PostScript and Encapsulated PostScript (EPS) files.
Parent Epic
Part of #91 - Document & Office Format Awareness
Description
Parse PostScript files to extract metadata and text strings while skipping binary image data.
Implementation Details
- PostScript is text-based programming language
- Parse comments for metadata (%%Title, %%Creator, etc.)
- Extract string literals
- Identify DSC (Document Structuring Conventions) comments
- Skip binary image data sections
String Sources
- DSC comments (metadata)
- String literals in code
- Font names
- Resource identifiers
- BoundingBox and page information
Acceptance Criteria
- Parse DSC comments
- Extract string literals
- Identify binary sections
- Handle both ASCII and binary PS
- Tests with PS and EPS files
Related
Project: #76
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request