Skip to content

Conversation

@akanshajain231999
Copy link

Feature: Option to disable timing metadata during ASR transcription

This PR adds a new feature that allows users to disable the printing of timing metadata in ASR (Automatic Speech Recognition) transcription output. By default, timing information like [time: 0.0-2.5] is included in the transcribed text, but users can now opt out of this behavior.

Changes Made:

Core Implementation:

  • Added include_time_metadata: bool = True field to InlineAsrOptions class in docling/datamodel/pipeline_options_asr_model.py
  • Modified _ConversationItem.to_string() method to accept an include_time_metadata parameter
  • Updated both _NativeWhisperModel and _MlxWhisperModel to respect the new setting

CLI Integration:

  • Added --asr-no-timing flag to disable timing metadata via CLI
  • The flag is automatically documented through the CLI's auto-generated documentation

Tests:

  • Added test_asr_pipeline_without_time_metadata() - verifies timing metadata can be disabled
  • Added test_asr_pipeline_with_time_metadata_default() - verifies timing metadata is enabled by default
  • Added test_conversation_item_to_string_with_and_without_time() - unit tests for the to_string() method

Backward Compatibility:

  • Default behavior unchanged: timing metadata is included by default
  • No breaking changes to existing APIs

Usage Examples

Programmatic:

from docling.datamodel import asr_model_specs
from docling.datamodel.pipeline_options import AsrPipelineOptions

pipeline_options = AsrPipelineOptions()
pipeline_options.asr_options = asr_model_specs.WHISPER_TINY.model_copy(deep=True)
pipeline_options.asr_options.include_time_metadata = False  # Disable timing

CLI:

docling audio.mp3 --asr-no-timing

Issue resolved by this Pull Request: Resolves #2564


Screenshot:

Screenshot 2025-11-12 at 10 31 39 PM

Checklist:

- [x] Documentation has been updated
  - CLI documentation auto-generates from code (includes new `--asr-no-timing` flag)
  - Code includes comprehensive docstrings explaining the feature
  
- [x] Examples have been added
  - The feature is straightforward and covered by tests
  - Usage is documented in code comments and test cases
  
- [x] Tests have been added
  -[test_asr_pipeline_without_time_metadata()] - Integration test for disabled timing
  -[test_asr_pipeline_with_time_metadata_default()]- Verifies default behavior
  -[test_conversation_item_to_string_with_and_without_time()]- Unit tests
  - ✅ All tests properly isolated (using [model_copy(deep=True)]
  - ✅ No compilation/lint errors

@github-actions
Copy link
Contributor

github-actions bot commented Nov 13, 2025

DCO Check Passed

Thanks @akanshajain231999, all your commits are properly signed off. 🎉

@dosubot
Copy link

dosubot bot commented Nov 13, 2025

Related Documentation

Checked 3 published document(s) in 1 knowledge base(s). No updates required.

How did I do? Any feedback?  Join Discord

@mergify
Copy link

mergify bot commented Nov 13, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

….com>

I, akanshajain231999 <[email protected]>, hereby add my Signed-off-by to this commit: b0f6e50

Signed-off-by: akanshajain231999 <[email protected]>
@akanshajain231999
Copy link
Author

Hey @ceberam , Can you please review this PR?

@ceberam ceberam self-requested a review November 13, 2025 07:15
@codecov
Copy link

codecov bot commented Nov 13, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Contributor

@ceberam ceberam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akanshajain231999 Thanks a lot for your contribution and detailed documentation and testing of this PR.

In terms of design:

  • This PR is intended to address the issue #2564 , Would be nice to have an argument to disable printing the timing metadata.. It looks to me as a reasonable request to customize the text serialization of an audio file parsed as DoclingDocument. One may want to extract the text from the DoclingDocument with the time metadata or without it. Note that your implementation is not addressing a serialization option but rather the parsing of the raw file into DoclingDocument. If a user sets the pipeline option include_time_metadata = False, the time metadata information will be lost and the converted DoclingDocument will not hold this information. I think it would be more flexible that DoclingDocument keeps the information and the user can then decide whether to extract/export all the information or just the text (without time metadata). Some alternatives could be:
    • keep everything in TextItem.orig (untreated representation) and only the text in text (sanitized representation). Note that text export formats like markdown will use text to serialize. This option would remove the hassle of dealing with pipeline options.
    • we are currently working in a new data model for audio provenance items in docling-core. The metadata information would be stored there and it would help manage metadata information from ASR and WebVTT files, as well as unlocking time-dependent chunking. Please see my final remark later on.
  • I have the impression that issue #2564 was about any type of metadata in ASR, i.e., both the time and the speaker metadata. In this PR, the option include_time_metadata = False would still keep the speaker annotation, which may also be annoying for those who just want to process text.

More technically:

  • Name of the CLI option: I find a bit confusing --asr-no-timing, because of the double negation of the explicit False option --no-asr-no-timing . I would have simply used --asr-metadata and --no-asr-metadata (and included the speaker annotation like explained above).
  • In test files, please try to avoid converting the same file multiple times across the same test module (e.g., like in test_asr_pipeline_with_time_metadata_default), not to make the test suite unnecessary longer. You can use fixtures with a module scope.

Since we do not want to introduce features that may be deprecated in the short term, please allow us some few days (expected next week) and we will get back with further suggestions on this PR.

@akanshajain231999
Copy link
Author

@ceberam Thanks for the detailed review. I will wait for your response until next week.
Meanwhile, do you have any other issues which I can work on?

@ceberam
Copy link
Contributor

ceberam commented Nov 14, 2025

@ceberam Thanks for the detailed review. I will wait for your response until next week. Meanwhile, do you have any other issues which I can work on?

@akanshajain231999 you are very welcome to contribute to Docling, this is an open-source collaborative project 🙂
Feel free to pick up an issue and when you are ready to actively work on it, you can set yourself in the Assignees list.
Here is a list of issues that I think could be easy wins. Some may be outdated, so please always consider the latest Docling release. 2626, 2515, 2465, 2487, 2476, 2367, 2298, 2351 (this one more ambitious)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: Option to turn off timing metadata during ASR

2 participants