Skip to content

Feature: OpenDocument format support #94

@coderabbitai

Description

@coderabbitai

Overview

Add support for OpenDocument formats (ODT, ODS, ODP) used by LibreOffice and OpenOffice.

Parent Epic

Part of #91 - Document & Office Format Awareness

Description

Parse OpenDocument files (ZIP-based) to extract metadata and text content from XML streams.

Implementation Details

  • OpenDocument files are ZIP archives containing XML
  • Parse meta.xml for metadata
  • Parse content.xml for document content
  • Handle styles.xml for formatting information
  • Extract embedded media metadata

String Sources

  • Document metadata (meta.xml)
  • Text content (content.xml)
  • Styles and formatting names
  • Embedded media filenames
  • Hyperlinks and references

Acceptance Criteria

  • Unzip OpenDocument files
  • Parse meta.xml for metadata
  • Extract text from content.xml
  • Handle ODT, ODS, ODP formats
  • Skip binary embedded objects
  • Tests with LibreOffice-created files

Related

Project: #76
Depends on: Phase 2 ZIP parser

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions