Skip to content

Conversation

@Tolriq
Copy link
Member

@Tolriq Tolriq commented Jul 15, 2025

New extension to have proper transcoding solution in OpenAPI.

This is a WIP to start the many discussions that this will bring.

@netlify
Copy link

netlify bot commented Jul 15, 2025

Deploy Preview for opensubsonic ready!

Name Link
🔨 Latest commit 15ed53e
🔍 Latest deploy log https://app.netlify.com/projects/opensubsonic/deploys/690f5452c0f91200084f0980
😎 Deploy Preview https://deploy-preview-168--opensubsonic.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@Tolriq Tolriq marked this pull request as draft July 15, 2025 09:03
Copy link
Contributor

@kgarner7 kgarner7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only reviewed the markdown. General thoughts:

  1. It's extremely important to describe the behavior of what happens when you specify multiple transcoding profiles. Which one will the server return (it can only return one).
  2. I'm a bit worried that the limitations would be a bit too heavy for a server. Not entirely sure.
  3. Some parts (codec, container) could probably be made more explicit on the format

All things said, I do think this is a good first pass. I'm worried that it's a bit heavy for the server (especially if a large number of profiles/codecs/limitations is provided), but it should be good especially for mobile clients.

@Tolriq
Copy link
Member Author

Tolriq commented Jul 20, 2025

It's extremely important to describe the behavior of what happens when you specify multiple transcoding profiles. Which one will the server return (it can only return one).

The transcoding profiles are in order of preferences, the server returns the first it can, the details are present transcode decision answer.

I'm a bit worried that the limitations would be a bit too heavy for a server. Not entirely sure.

Limitations are the core of the decision process, plex, emby, jellyfin all works with the same concepts, it's relatively easy to manage server side.

Some parts (codec, container) could probably be made more explicit on the format

There was some discussion some times ago about returning details about the media and we did not reach a consensus.

I agree a consensus would be better as it would be used for the details returns for the tracks too, let's hope people can accept something.

@lachlan-00
Copy link
Member

This seems really complicated.

Is this more helpful for video servers like jellyfin where you can select various stream outputs from a list of available options?

@Tolriq
Copy link
Member Author

Tolriq commented Jul 23, 2025

This is not complicated ;)
This is necessary to have proper transcoding support for proper audio quality.

There's a lot of details in the discussion part.

Clients should have control on what quality they want. If I can't play some format like DSD, I do not want to receive low quality MP3, I want hi res FLAC.

When I cast to a Sonos device that does not support my FLAC 24/96 I want to receive FLAC 24/48 and now low quality mp3 or opus.

Same when casting to chromecast and a million other cases.

All major media providers have such an API and this is the main missing part of Subsonic for audiophiles and casting.

@lachlan-00
Copy link
Member

that explains why i'm not following it well, i listen to whatever source i'm given. but that makes sense now

@epoupon epoupon mentioned this pull request Aug 15, 2025
@Tolriq
Copy link
Member Author

Tolriq commented Sep 5, 2025

@opensubsonic/servers So holidays are now mostly over :) Any objections or remarks on the draft would be nice to be done before the final polish and having to drop / rewrite everything.

@sentriz
Copy link
Member

sentriz commented Sep 10, 2025

+1 on the complicated topic. I don't understand why we couldn't get by with extending the current stream.view?format= param

this param has been around for years, but it's been ambiguous for servers outside the original subsonic server

but I see it as just a key for the format the client is requesting, if we had something like

getFormats.view which returned say

{
"format_1": {"codec": ..., "bitrate": ..., "sampleRate", ..., "mime", ..., "etc", ...},
"format_2": {"codec": ..., "bitrate": ..., "sampleRate", ..., "mime", ..., "etc", ...},
"format_3": {"codec": ..., "bitrate": ..., "sampleRate", ..., "mime", ..., "etc", ...},
}

these represent the possible formats a server can transcode to

the client can choose to use a format or not on its own without the server's knowledge

the song's original bitrate/sampleRate/etc is already known from the Child response:

image

so the client sees there is a transcode option resulting in a bitrate < the songs bitrate, and in a codec it can play. it can choose a format. format_1 for example

then request it getStream.view?format=format_1

this is also backwards compatible with the original subsonic server and only a small extension


so i wonder, which problem does this solution fail to address?

@sentriz
Copy link
Member

sentriz commented Sep 10, 2025

also as a side, how do these changes interact with the transcodedContentType and transcodedSuffix which clients use?

@lachlan-00
Copy link
Member

also as a side, how do these changes interact with the transcodedContentType and transcodedSuffix which clients use?

I've always treated these as the default output when running stream/download without additional options. User selectable format wouldn't affect the defaults outputs in that case.

@Tolriq
Copy link
Member Author

Tolriq commented Sep 11, 2025

I've already given a dozen examples about the need and why and all the other media providers providing such API because it is necessary to address a lot of cases.

Transcoding is not about a couple of pre defined server list, it's about having control of the result for the best result for the user.

Again if I want to cast my hires FLAC 24/96 to my Sonos device I want hi res FLAC 24/48 to have the best sound. I do not want to have to choose between mp3 and opus because that's the only 2 default values the server have.

I also do not want my DSD files transcoded to mp3 or to have to force a bitrate I want a format that I support.

xHE AAC, ... and so many different needs depending on the player and the cast target. When I cast to the phone I want 2 channels, when I cast to my hi end AVR I want to keep the 6 channels.

Each device, UPnP renderer, Chromecast, ... will have a unique list of supported combinations of parameters, this can't be handled with 3 pre defined profiles on the server.

And the details from Child are not precise enough mime and suffix are more about container than actual detailed codec informations.

also as a side, how do these changes interact with the transcodedContentType and transcodedSuffix which clients use?

This does not change anything on the fact they are random values as servers already supported multiple profiles ;) Most servers report them as the default transcoded result if the user does not request a transcode but the server force it.

Something that is also an issue currently, if a server decide to transcode on it's own due to it's internal settings, users are not really aware and we can't properly use the seek extension to properly seek in those transcodes.

TL;DR; The current solution is ultra limited and while it may fit some basic needs, it's not a proper solution for a mature streaming solution that OpenSubsonic needs to compete with the rest of the eco system.

@sentriz
Copy link
Member

sentriz commented Sep 11, 2025

I've already given a dozen examples about the need and why and all the other media providers providing such API because it is necessary to address a lot of cases.

Transcoding is not about a couple of pre defined server list, it's about having control of the result for the best result for the user.

Again if I want to cast my hires FLAC 24/96 to my Sonos device I want hi res FLAC 24/48 to have the best sound. I do not want to have to choose between mp3 and opus because that's the only 2 default values the server have.

I also do not want my DSD files transcoded to mp3 or to have to force a bitrate I want a format that I support.

I'm not talking about 3 formats. There could be 10s or 100s of them. All the possible codecs, sample rates, channels, bitdepths. The server has the control here to
not show combinations of parameters which aren't possible or don't make sense.

So if you want FLAC 24/48, you choose that option. If that would be upsampling, you don't choose it

for example an incomplete list of formats:

{ "name": "flac_24_48k", "codec": "flac", "bitDepth": 24, "sampleRate", 48000},
{ "name": "flac_16_44k", "codec": "flac", "bitDepth": 14, "sampleRate", 44100},
{ "name": "opus_192",    "codec": "ogg", "bitRate", 192 } // lossy, no bitrate or sample rate
{ "name": "opus_128",    "codec": "pgg", "bitRate", 192 }, // lossy, no bitrate or sample rate

Note how we don't show sampleRates and bitDepths for lossy formats. That's something the server needs to control

xHE AAC, ... and so many different needs depending on the player and the cast target. When I cast to the phone I want 2 channels, when I cast to my hi end AVR I want to keep the 6 channels.

Each device, UPnP renderer, Chromecast, ... will have a unique list of supported combinations of parameters, this can't be handled with 3 pre defined profiles on the server.

This can still be supported, with the above stuff

This proposal has the benefit of actually being feasible to implement, for servers.

And the details from Child are not precise enough mime and suffix are more about container than actual detailed codec information.

Then we can enhance this information, if it's not enough. And in a backwards compatible way. This info would be needed for this "format=" approach so that the client can correctly choose the format in wants by comparing these valuses to the getFormats values

@Tolriq
Copy link
Member Author

Tolriq commented Sep 11, 2025

All the possible codecs, sample rates, channels, bitdepths.

This is not 10 or 100, this is multiple thousands of combinations: protocol, codecs, subCodec, containers, bitdepth, samplerate, channels. Without even talking about bitrate.

This proposal has the benefit of actually being feasible to implement, for servers.

So if this proposal that is actually implemented by Plex, Emby and Jellyfin is not possible implement how did they did it?
This proposal is actually not hard to implement and if you don't think that you can do it then do not implement the extension.

I'm sorry, but listing thousands of combinations makes absolutely no sense. Either we implement a proper transcoding engine or we don't. But what you propose is not a solution to the need of the users and the clients.

If the server is able to automatically generate the list of the thousands of combinations then it can easily implement this feature as proposed in a proper way. If it's not capable and you need to manually enter them, then this server will not be able to fit the users need either.

@gravelld
Copy link
Member

getTranscodeDecision is a little clumsy for a term. How about *[T|t]ranscodeDecision -> *[T|t]ranscodeStreams? i.e. getTranscodeStreams returns a transcodeStream object

In the case of Plex et al, is it implemented this way because there's an implied control over the client? i.e. is knowledge about the client embedded in the server side code that creates the decision? One example: if there are multiple competing equivalent codecs specified, say flac and alac, how is it decided which is returned? A client may want to override that decision if the alternatives are essentially equivalent to the decision strategy. If it's to do with ordering in the query, this needs documenting.

I guess I'm not clear on the sort of control that can be be exerted by the client.

@Tolriq
Copy link
Member Author

Tolriq commented Sep 15, 2025

100% of the control is done by the client it gives a list of everything is can directly play and a list of wanted transcode profile IN ORDER. If the media fit the direct play profile then the server says you can direct play else it takes the transcoding profile in order and see the first it can do and return the necessary data for it.

The terms are related to the function. The first one is asking for a decision that can contain a transcode and the second is there to actual get the transcoded content like stream.view it does not return an object.

So IMO the getTranscodeDecision is coherent with all the other get endpoints the transcodeStream can be renamed to match the current stream both makes sense (Like we have getLyrics or getCaptions to extract data from a track)

@epoupon
Copy link
Member

epoupon commented Oct 27, 2025

In order to better understand this PR, I spent a couple of hours implementing it.
At first, I found it a bit overcomplicated, but it eventually made sense to me.
So I think it would really be a nice addition to the API OS, and it is not that complicated to implement on the server side.

Here’s what I noted:

  • Mismatch between songId in getTranscodeDecision and trackID in getTranscodeStream.
  • "maxAudioChannels" could possibly be shifted from DirectPlayProfile / TranscodingProfile to ClientInfo directly (seems to have the same usage as for the max bitrate).
  • Make it clear we don’t expect multiple values in the CodecProfile structs.
  • It looks like Jellyfin may mix up container names with file extensions; "opus" would be a valid container, considered to be "ogg". Not sure we want this.
  • For maxAudioBitrate and maxTranscodingAudioBitrate, I guess no value means no limit (should be written down)? Or make it mandatory but 0 means no limit?
  • Not sure about offset in the getTranscodeDecision endpoint since we can also set it in getTranscodeStream. Is the latter an offset to apply on top of the first one? An override? I’d just remove offset from getTranscodeDecision (it’s not part of the decision anyway).
  • "transcodeReasons" is an array, but it’s not clear which reason applies to which direct play profile or codec profile.
    Looks like we also need AudioBitdepthNotSupported, which is currently missing.

@Tolriq
Copy link
Member Author

Tolriq commented Oct 28, 2025

Mismatch between songId in getTranscodeDecision and trackID in getTranscodeStream.

Yes.

"maxAudioChannels" could possibly be shifted from DirectPlayProfile / TranscodingProfile to ClientInfo directly (seems to have the same usage as for the max bitrate).

As explained it's not at the top for optimisations reasons during playback. Your audio engine on the phone can convert a 6 channels to stereo during playback so you can support 6 channels in directplayprofiles, but directly converting to 2 channels if there's transcoding lower CPU usage on the client since there will already be some transcoding it's better to have the server working than the phone.

Make it clear we don’t expect multiple values in the CodecProfile structs.

Yes

It looks like Jellyfin may mix up container names with file extensions; "opus" would be a valid container, considered to be "ogg". Not sure we want this.

The values are normally the ffmpeg values, if a clients sends invalid or unknown values they should just be ignored.

For maxAudioBitrate and maxTranscodingAudioBitrate, I guess no value means no limit (should be written down)? Or make it mandatory but 0 means no limit?

0 means no limit as some clients will always encode fields. We can either make them mandatory or not as people prefer.

Not sure about offset in the getTranscodeDecision endpoint since we can also set it in getTranscodeStream. Is the latter an offset to apply on top of the first one? An override? I’d just remove offset from getTranscodeDecision (it’s not part of the decision anyway).

Yes it's a leftover before moving to just a transcodeParams and not a full url.

"transcodeReasons" is an array, but it’s not clear which reason applies to which direct play profile or codec profile.
Looks like we also need AudioBitdepthNotSupported, which is currently missing.

Yes there's probably some errors missing, IMO raw string is enough as in all cases this will more be for the dev than to expose nice messages to users.

Copy link
Member

@epoupon epoupon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

epoupon
epoupon previously approved these changes Nov 8, 2025
New extension to have proper transcoding solution in OpenAPI.
@Tolriq Tolriq marked this pull request as ready for review November 8, 2025 14:34
@Tolriq Tolriq requested a review from kgarner7 November 8, 2025 19:06
@Tolriq
Copy link
Member Author

Tolriq commented Nov 8, 2025

@opensubsonic/servers @opensubsonic/clients The proposal is updated. It's present in an LMS build an proven working to address this important missing part for OS.

@lachlan-00
Copy link
Member

lachlan-00 commented Nov 11, 2025

should there be references in the extenstion that formPost is required?

The client sends a json payload to the server which means you need to support formPost as well as this extension?

Actually i just read its x-www-form-urlencoded in that extension but does that mean we should also extend that extension to support json?

@Tolriq
Copy link
Member Author

Tolriq commented Nov 12, 2025

This is not form post that allow query params in the body.

This is a normal http post, it's 100% unrelated.

@lachlan-00
Copy link
Member

How does a client send the clientinfo json body for getTranscodeDecision to the server?

This is the only request that requires the client send something outside parameters to the server as far as i know

@dweymouth
Copy link
Member

How does a client send the clientinfo json body for getTranscodeDecision to the server?

This is the only request that requires the client send something outside parameters to the server as far as i know

It is, though I suppose the idea is for new endpoints we don't have to be bound to the old, pre-REST-era API design decisions to use GET with query parameters for everything. Of course it makes sense to follow this for simple things that are related to existing endpoints (e.g. if we were to add a getSongs endpoint)

@lachlan-00
Copy link
Member

How does a client send the clientinfo json body for getTranscodeDecision to the server?
This is the only request that requires the client send something outside parameters to the server as far as i know

It is, though I suppose the idea is for new endpoints we don't have to be bound to the old, pre-REST-era API design decisions to use GET with query parameters for everything. Of course it makes sense to follow this for simple things that are related to existing endpoints (e.g. if we were to add a getSongs endpoint)

Okay at least that makes sense.

But what I'm saying is that it's not written down as an extension, just in the endpoint doc.

I think that's what was so confusing for me for because i didn't see that part so it needs to be more prominent to say
"you must update your server to do something we haven't done before."

@lachlan-00
Copy link
Member

lachlan-00 commented Nov 14, 2025

Adding in a jsonPost extension let's you say something like

v1 - support json post body requests (required for transcoding extension)

Then any future changes can be updated there and i can also just support json body for all methods as well

This also lets me do getTranscodeDecision requests without a JSON body as well and then the server can return the default based on other preferences

So a client can say on wifi use stream and on wan use getTranscodeDecision url's without having to build a whole client list

Copy link
Member

@lachlan-00 lachlan-00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Add jsonPost extension and update getTranscodeDecision to reflect that this extension is required for full support
  • Allow no JSON body from client and return default transcode response using server config/preferences if available
  • Allow rejecting no JSON body requests. (if you support jsonPost you must require json request body)

Now that i understand where this comes from i think these changes make it much easier to follow and will approve that.
#168 (comment)

@Tolriq
Copy link
Member Author

Tolriq commented Nov 14, 2025

Why on earth so you want a second extension that have no purpose ?
The transcoding extension says the endpoint uses json post. Having another extension does not bring anything to the table.

What a client would do with that information ? The existing extension for post is necessary to indicate a change to old endpoint.
A new extension would not indicate anything, no old endpoint would support those. So clients will just look at what extension they need and transcoding is auto sufficient as have all the needed information.

No json in that query also makes no sense, clients should use the old endpoint if they want default server config.

And that endpoint returns already an error if there's no posted json.

Again and extension for new feature is self contained, there's no need to add another extension that give no more information.

@lachlan-00
Copy link
Member

The purpose is you don't have it anywhere else in the api.

It's not in the original subsonic api so it's an extension

@Tolriq
Copy link
Member Author

Tolriq commented Nov 14, 2025

Yes an extension named transcoding......

A jsonpost extension would add what information?
We have endpoints that may require jsonpost, but we can't tell witch one since the ones that will use it will require another extension.

The new extensions that needs json post have the information on them where it's useful.

A jsonpost extension gives absolutely 0 information about anything that is not already available in the proper other extensions.

@lachlan-00
Copy link
Member

This is all it needs to be, same as formPost.

Version 1

This extension requires that the server support passing API arguments via POST with the a text body following the application/json format.

Then it can be referenced from getTranscodeDecision to inform the reader.

It's a change that needs to be referenced outside of just on the endpoint as i missed this entirely when i first saw this.

If i couldn't easily read this change and didn't fully understand what was happening it will happen to other people. This will make it clear.

At a minimum that's what i want to see

@Tolriq
Copy link
Member Author

Tolriq commented Nov 14, 2025

If you did not see it then the doc of the transcoding extension needs to be updated so you can see it. Adding something unrelated is not the solution.

Your proposal says need to support json post but not where so it serves no purpose as an extension.....

To have it as a proper extension it would require that v1 says. The end point gettranscode must support json.
When we add another on then we have v2, then v3 ....
This makes no sense as this brings 0 useful information, just duplicate things and force maintaining....

@epoupon
Copy link
Member

epoupon commented Nov 16, 2025

Hello!
@lachlan-00 Indeed, a new extension also seems overkill to me. If you missed it while reading the PR, it seems to me that if we clearly add that the client info must be in the body of the POST request for getTranscodingDecision, that does the job?

@lachlan-00
Copy link
Member

It just needs to be more visible in some way, It's semi-hidden not clear and will be missed by other people.

I'll not debate what is or isn't too much but it's not enough for me.

@Tolriq
Copy link
Member Author

Tolriq commented Nov 19, 2025

The thing is that I'm not you so I can't guess "some way". Can you try to explain what more details / sentence you need in the end point ?

Currently it says in big.

Request Body
The request body must be a JSON object containing the client’s capabilities.

As me, Request Body in big in the header carries all the necessary information, so I really don't know what more may make this more clear.

@dweymouth
Copy link
Member

@lachlan-00 I think you'll have to come up with another suggestion for making it more clear without an OpenSubsonic extension, because it's not confusing to me either. Seems like most are satisfied with the Request Body section of the documentation for the endpoint

@paulijar
Copy link
Member

Could there maybe be some kind of overall explanation of the feature on the extension page https://deploy-preview-168--opensubsonic.netlify.app/docs/extensions/transcoding/? Something explaining the intended use flow, possibly including a sequence diagram? I'm not sure, if finding the bit about HTTP POST from there would be any easier, but at least it would help in understanding the big picture before diving into the details.

@Tolriq
Copy link
Member Author

Tolriq commented Nov 19, 2025

The picture is not really big, you tell the server your config it tells you if you can direct play or transcode and you pass the arguments to the new transcode endpoint.

Can add more texts, but starting to handle diagrams seems too much, specially as we'd need to define the standard and everything about them so they can be reused and not put random png files different in each new endpoints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants