Skip to content

Add libplacebo GPU module with render and shader filters#1201

Open
D-Ogi wants to merge 6 commits intomltframework:masterfrom
D-Ogi:feature/placebo-module
Open

Add libplacebo GPU module with render and shader filters#1201
D-Ogi wants to merge 6 commits intomltframework:masterfrom
D-Ogi:feature/placebo-module

Conversation

@D-Ogi
Copy link

@D-Ogi D-Ogi commented Feb 1, 2026

Summary

New module placebo providing GPU-accelerated video processing via libplacebo:

  • placebo.render — GPU scaling (ewa_lanczos, lanczos, mitchell, etc.), debanding, dithering (blue noise, ordered LUT), and tonemapping (auto, clip, mobius, reinhard, hable, bt.2390, spline) with quality presets (fast/default/high_quality)
  • placebo.shader — Custom mpv-compatible .hook shader support with hot-reload on file change

Architecture

  • Singleton GPU context (gpu_context.c) with thread-safe initialization and render locking
  • Backend priority: D3D11 (Windows) → Vulkan → OpenGL
  • Vulkan loader dynamically loaded on Windows when libplacebo is built without vk-proc-addr support
  • Shader cache persisted to disk for faster subsequent startups
  • Graceful passthrough when no GPU is available

Build

Controlled by MOD_PLACEBO CMake option (default ON). Requires libplacebo via pkg-config. Optionally links D3D11/DXGI when PL_HAVE_D3D11 is detected at configure time. MSVC builds link PThreads4W.

Files (10 total)

File Description
CMakeLists.txt +1 line: MOD_PLACEBO option
src/modules/CMakeLists.txt +4 lines: placebo subdirectory
src/modules/placebo/CMakeLists.txt Module build config
src/modules/placebo/factory.c Module registration
src/modules/placebo/gpu_context.h GPU lifecycle API
src/modules/placebo/gpu_context.c Singleton GPU init (D3D11/Vulkan/OpenGL)
src/modules/placebo/filter_placebo_render.c Render filter implementation
src/modules/placebo/filter_placebo_render.yml Render filter metadata
src/modules/placebo/filter_placebo_shader.c Shader filter implementation
src/modules/placebo/filter_placebo_shader.yml Shader filter metadata

Testing

Tested on Windows with D3D11 backend via Kdenlive. Verified:

  • GPU initialization and fallback chain
  • Render filter with default/fast/high_quality presets
  • Shader filter with Anime4K and FSRCNNX .hook files
  • Shader hot-reload on file modification
  • Graceful passthrough when GPU is unavailable
  • Shader cache persistence across sessions
  • Thread safety under concurrent filter instances

@ddennedy
Copy link
Member

ddennedy commented Feb 1, 2026

How does this compare with using libplacebo through the existing avfilter?

@D-Ogi
Copy link
Author

D-Ogi commented Feb 1, 2026

The main difference is the GPU context lifecycle. The avfilter wrapper creates a new AVFilterGraph per filter instance and vf_libplacebo initializes its own Vulkan device inside that graph. With multiple filters on a timeline you get multiple GPU contexts. The native module uses a process-wide singleton in gpu_context.c so one pl_gpu, one pl_renderer, one pl_dispatch shared across all instances.

The frame path is also shorter. The avfilter wrapper does two memcpy round-trips between MLT buffers and AVFrames (line-by-line with linesize conversion), on top of whatever vf_libplacebo does internally for GPU transfer. The native module calls pl_tex_upload/pl_tex_download directly on the MLT image pointer, no intermediate AVFrame.

A the most interesting part is the shader filter that doesn't have an avfilter equivalent. It loads mpv .hook files at runtime and checks file mtime on every frame so when the file changes on disk, it re-parses only the pl_hook object while keeping the GPU context alive. This is useful for iterative shader development in for instance an NLE where you want to edit a .hook file in a text editor and see the result on the timeline without restarting anything. The avfilter path would need a full graph rebuild to pick up a changed shader_path.

The trade-off is a direct build dependency on libplacebo vs getting it through FFmpeg. The module could be optional so wouldn't affect builds where libplacebo isn't available. Regarding the tests, at this time I don't have solid Linux/MacOS environments to test all the dependencies, but as I read from failed tests the root cause seems to be easy to fix.

@D-Ogi D-Ogi force-pushed the feature/placebo-module branch 2 times, most recently from 518b737 to 67d2fa2 Compare February 1, 2026 21:33
New module 'placebo' providing GPU-accelerated video processing via
libplacebo. Includes two filters:

- placebo.render: GPU scaling, debanding, dithering, and tonemapping
  with quality presets (fast/default/high_quality)
- placebo.shader: Custom mpv-compatible .hook shader support

Backend priority: D3D11 (Windows) -> Vulkan -> OpenGL.
Vulkan loader is dynamically loaded on Windows when libplacebo is
built without vk-proc-addr support.

Features:
- Singleton GPU context with thread-safe access
- Shader cache persistence
- Multiple scaling algorithms (ewa_lanczos, lanczos, mitchell, etc.)
- Tone mapping (auto, clip, mobius, reinhard, hable, bt.2390, spline)
- Graceful fallback to passthrough when no GPU is available

The module is enabled by default but skipped automatically when
libplacebo is not installed.
@D-Ogi D-Ogi force-pushed the feature/placebo-module branch from 67d2fa2 to ebe1cae Compare February 1, 2026 21:48
@D-Ogi
Copy link
Author

D-Ogi commented Feb 2, 2026

Fixed the MinGW build: %zu is not supported by MSVCRT's printf which is what MinGW uses under the hood. Replaced with %llu + explicit cast.

@ddennedy
Copy link
Member

ddennedy commented Feb 2, 2026

You need to get at least some build workflows to actually build this (not all). For example,

  • .github/workflows/build-distros.yml: Add libplacebo-dev to Ubuntu and Debian, and add libplacebo-devel to Fedora 42 (I do not think one is available for Fedora 38).
  • .github/workflows/build-linux.yml: Add the libplacebo-dev package for the build-cmake job.
  • .github/workflows/build-msys2-mingw64.yml: Add the mingw-w64-x86_64-libplacebo package.

Replaced with %llu + explicit cast.

Our codebase generally prefers the macros %" PRIu64 " and %" PRId64 " from <inttypes.h>.

@D-Ogi D-Ogi force-pushed the feature/placebo-module branch 3 times, most recently from c5189b2 to 8129348 Compare February 2, 2026 23:12
Use PRIu64/PRId64 from <inttypes.h> instead of %zu/%ld for size
logging in the placebo module. Add libplacebo-dev packages to
Ubuntu, Debian, and Fedora 42 CI workflows, and
mingw-w64-x86_64-libplacebo to the MSYS2 MinGW64 workflow.
@D-Ogi D-Ogi force-pushed the feature/placebo-module branch from 8129348 to 35e85eb Compare February 2, 2026 23:25
@D-Ogi
Copy link
Author

D-Ogi commented Feb 2, 2026

Done. Added libplacebo packages to the three workflows, switched to PRIu64/PRId64 from <inttypes.h>, and added a minimum version requirement (libplacebo>=5.229) in the module's CMakeLists so older distros like Ubuntu 22.04 (ships v4.192) skip the module instead of failing.

Verified on my fork - all green: MSYS2 MinGW64, Ubuntu 24.04, 22.04, Debian stable/testing/unstable, Fedora 42, 38.

Break long mlt_log_info() call into multi-line format to match
the project's clang-format rules (same style as load_cache above).
@D-Ogi
Copy link
Author

D-Ogi commented Feb 4, 2026

Hi Dan, could I ask if you have an estimated timeline for the next round of review? This PR is a key enabler for my downstream work. The shader filters support lets Kdenlive reproduce After Effects preset pipelines and opens the door for the community to write custom GPU shaders within MLT.

@ddennedy
Copy link
Member

ddennedy commented Feb 4, 2026

In about 10 days as I’m on vacation

@ddennedy
Copy link
Member

ddennedy commented Feb 4, 2026

Something for you to comment on or think about until then. I have not looked closely enough. What happens when multiple placebo MLT filters are used on a producer? Does it transfer the image from RAM to GPU and back to RAM for each filter?

@D-Ogi
Copy link
Author

D-Ogi commented Feb 4, 2026

Currently each filter does a full RAM -> GPU -> RAM roundtrip per frame. The flow for N chained placebo filters looks like this:

RAM (producer image)
-> GPU upload -> GPU render -> GPU download -> RAM (filter 1)
-> ... -> ... -> ...- > RAM (filter 2)
-> ...-> ... -> ... -> RAM (filter 3)

So with 3 filters that's 6 CPU <-> GPU transfers instead of the ideal 2 (one upload at the start, one download at the end).
The singleton gpu_context shares the pl_gpu, pl_renderer, and pl_dispatch across all filter instances, so there's no redundant GPU initialization. But... each filter creates temporary textures, uploads the RAM buffer, processes, downloads back to RAM, and destroys the textures.

The reason is that MLT's mlt_frame_get_image() contract is fundamentally CPU-buffer-based. And I think no way for a filter to pass a GPU texture handle to the next filter in the chain. To eliminate the intermediate transfers, the frame would need to carry a "GPU-resident image" flag and a texture reference that downstream filters can reuse, with only the last filter in the chain (or the consumer) performing the final download. That's a non-trivial change to MLT's image passing architecture.

I could attempt to implement this, for example by attaching a pl_tex to the frame via mlt_properties_set_data and having each placebo filter check for an existing GPU texture before uploading from RAM. The last consumer or non-placebo filter would trigger the download. But I'd rather hear your thoughts on the right approach before going down that path, since it touches assumptions about frame ownership and lifetime that you know much better than I do.

When multiple placebo filters are stacked on one clip, each filter
previously did a full RAM→GPU upload and GPU→RAM download. The
intermediate uploads are redundant because the next placebo filter
would re-upload the same pixels immediately.

Each filter now attaches its output texture to the mlt_frame via
placebo_frame_put_tex(). The next placebo filter calls
placebo_frame_take_tex() to grab it directly as source, skipping
the upload. The download to RAM still happens every time (MLT
expects the image buffer to be current for non-GPU filters).

Staleness detection: put_tex records the RAM buffer pointer,
take_tex compares it against the current pointer. If a CPU filter
ran in between and requested a writable buffer (triggering a copy
and new allocation), the pointers differ and take_tex returns NULL,
falling back to a fresh upload.

Also cleans up internal ticket-style comments (C1/W2/etc.) with
descriptions of actual logic and pitfalls.
@D-Ogi D-Ogi force-pushed the feature/placebo-module branch from 84ee7e4 to 32f8a47 Compare February 4, 2026 23:01
@D-Ogi
Copy link
Author

D-Ogi commented Feb 4, 2026

benchmark_4k_results

D-Ogi added 2 commits February 5, 2026 00:33
Add apply_shader_params() to override pl_hook DYNAMIC parameters from
MLT animated properties (shader_param.* prefix) on every frame.  Uses
mlt_properties_anim_get_double/int to correctly resolve keyframe strings
("0=200;50=100") at the current frame position.

Add base64 decoding for shader_text values prefixed with "base64:" to
support inline shaders with characters that are problematic in MLT
property strings.
Run clang-format-14 (matching CI) on filter_placebo_shader.c and
gpu_context.c to fix designated initializer spacing, ternary line
breaks, and long argument lists.
@bmatherly
Copy link
Member

The last consumer or non-placebo filter would trigger the download. But I'd rather hear your thoughts on the right approach before going down that path, since it touches assumptions about frame ownership and lifetime that you know much better than I do.

I wonder if we could treat this similar to the movit frame type. I have actually also been thinking about this for the ffmpeg based filters since we suffer multiple unnecessary format conversions when stacking avfilter instances on top of each other.

Here is an idea: we could define a series of image types as "private":

  • movit
  • placebo
  • libav frame

If a frame has an image of of one of these private types, then it must also contain a function that can convert it to a public image type. If a service received the frame, and recognizes the private type, then it can use it as-is. But if it does not recognize it, then it can convert it to a public type.


load_cache();

atexit(placebo_gpu_release);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like atexit() as we have had problems with it in the past causing false negatives or simply a crash on something very trivial. What happens if the process crashes and placebo_gpu_release() is not called? Will the OS kernel cleanup after the process?
Change this to use mlt_factory_register_for_clean_up() and then applications can call mlt_factory_close() if they choose to.

Comment on lines +106 to +120
#ifdef _WIN32
char appdata[MAX_PATH] = {0};
if (SUCCEEDED(SHGetFolderPathA(NULL, CSIDL_APPDATA, NULL, 0, appdata))) {
snprintf(buf, len, "%s\\mlt\\placebo_shader_cache.bin", appdata);
} else {
buf[0] = '\0';
}
#else
const char *home = getenv("HOME");
if (home) {
snprintf(buf, len, "%s/.local/share/mlt/placebo_shader_cache.bin", home);
} else {
buf[0] = '\0';
}
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is bad practice for libraries to make policy decisions for applications and choose on their own where to write files. I think you need to add a property somewhere somehow to override this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a property on a filter is awkward, add an environment variable like MLT_PLACEBO_CACHE_PATH to allow an application to override this location.

Comment on lines +392 to +394
* falls back to a fresh upload. This is safe because MLT's standard
* mechanism for a filter to modify frame data is to request a writable
* buffer, which always produces a new allocation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reality, I cannot find where writable is actually used by anything! It is basically a legacy artifact in the API. A filter may update an image buffer in-place or replace it, and there is no "dirty" flag. I think this assumption will not work reliably for the filters that do not change the image pointer. Unfortunately, I do not have a simple change to recommend at this time.

This relates to @bmatherly comment about a private image type. However, I am not sure about the mechanics of that idea. An idea I have is learning from movit.convert, which hard-codes the name of the CPU converters, e.g. "avcolor_space", and uses them where needed. Similar to how mlt_frame_get_image() uses mlt_frame.stack_image maybe there needs to be a non-destructive list of converters. By "non-destructive" I mean there is no stack popping. Rather, whenever a conversion is needed, mlt_frame iterates the list until success, and the function pointers modules put into that list return error on types it does not to handle. Unfortunately, there is no private part to struct mlt_frame, but I think we can define one in mlt_frame.c and store it in a data property. Next, instead of a filter doing mlt_frame->convert_image = my_convert_image it needs to call a new mlt_frame function (and mlt_frame->convert_image will be initialized to a new public function for backwards compatibility.

I am not yet sure how a module can express the priority of its convert_image function pointer it will set on the frame. I think it is clear that "avcolor_space" is last in the list and "movit.convert" is near the beginning and probably order of placebo and movit do not matter as they are mutually exclusive. I am thinking about changing producer_loader.c to handle a special line from loader.ini keyed on "convert_image". But instead of stopping on the first filter that succeeds to exist and initialize, it tries to add all of them in order. The order of the filters determines the order of the list of function pointers.

That still leaves something needs to be done about mlt_image_format. We can add mlt_image_placebo like we have one for movit, but that is not exactly extensible. OpenFX might need one eventually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To resolve this issue and move the PR along, I suggest to revert 32f8a47 (Reuse GPU textures between chained placebo filters for now. Then, Brian and I can work on the changes we have in mind, and we can optimization chaining placebo filters after.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants