Skip to content

Conversation

@alive4ever
Copy link
Contributor

@alive4ever alive4ever commented Jun 18, 2025

Add ability to obtain captions from get_transcript innertube api instead of using caption baseUrl of player response.

The feature is behind new settings: use_innertube_for_captions, which is set to False by default.

The protobuf encoded params for get_transcript is crafted using blackboxprotobuf module due to its lightweight size and easy-to-use.

Currently only manual and auto generated captions are supported. There is no support for translated captions, so request for translated captions will return the caption in its original language.

This will hopefully fix #239

Copy link
Owner

@user234683 user234683 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review. Have a few comments

Add settings to get caption from get_transcript innertube api.
Disabled by default.
Add bbpb (i.e. blackboxprotobuf) as requirements to encode protobuf for
innertube caption.
Add innertube_caption submodule to enable fetching captions via
innertube get_caption api.

No support for translated captions for now.
Add support to get captions via innertube api.
Add test for vtt_body and retry using footer continuation.
Will use the built-in proto module.
Use built-in proto submodule instead of bbpb to generate innertube
caption request params.
Avoid url-quoting twice of params.
Set use_innertube_for_captions to True to get captions working.
Use deep_get from yt_data_extract to access nested dict items to provide
safe extraction of its value.

Also rewrite vtt part construction to use f-string.
@alive4ever alive4ever force-pushed the feature-innertube-captions branch from 3ccde56 to 9a10217 Compare August 28, 2025 02:12
Set text/vtt as mimetype for get_caption()
Fix missing newline at the end of vtt chunk running text.
Add a workaround to get non innertube captions working, i.e. when
use_innertube_for_captions is disabled.
Return 302 redirect to /api/timedtext when accessing /watch/transcript/
endpoint.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Captions are not being displayed

2 participants