Skip to content

Conversation

@bcheidemann
Copy link

@bcheidemann bcheidemann commented Mar 17, 2025

This is a POC to accompany RFC #238

Important

This PR is not up to date with the RFC.

@bcheidemann bcheidemann marked this pull request as draft March 17, 2025 19:07
@bcheidemann bcheidemann changed the title Feat/add language java POC: Add language java Mar 17, 2025
@bcheidemann bcheidemann mentioned this pull request Mar 17, 2025
@Lucretiel
Copy link
Contributor

Definitely keep this draft open so that the source can stay available, but be aware that in an effort to reduce the maintenance burden on the typeshare owners, we're pausing accepting PRs for new languages while we work on a major refactor (being worked on in the typeshare2 branch) that will allow language implementations to exist as separate crates that can be owned and maintained independently from this repo.

@bcheidemann
Copy link
Author

Definitely keep this draft open so that the source can stay available, but be aware that in an effort to reduce the maintenance burden on the typeshare owners, we're pausing accepting PRs for new languages while we work on a major refactor (being worked on in the typeshare2 branch) that will allow language implementations to exist as separate crates that can be owned and maintained independently from this repo.

@Lucretiel thanks for the info! This is very exciting 🙂

Coincidentally, I actually did some work along the same lines on my local. My motivation for this exploratory work was three-fold:

  1. Reduced burden on the typeshare owners (as you mentioned)
  2. Allowing 3rd parties to quickly add the languages they need
  3. Finding a way to run untrusted 3rd party languages securely

For this reason, my POC focused on implementing 3rd party language plugins using WASM. It ended up looking something like this:

struct MyLanguagePlugin;

#[typeshare::plugin]
impl typeshare::Language for MyLanugagePlugin {
  // ...
}

Which generates extern functions for __alloc, __dealloc, and for each of the functions in the Language trait.

My vision for this was that you could then run:

typeshare --lang https://.../my_language.wasm ...
typeshare --lang file:///path/to/my_language.wasm ...

The main advantage of executing WASM, rather than native code, is that it's sandboxed. This means consumers only need to audit the code which the plugin outputs, rather than the source code of the plugin.

If you'd like, I'd be happy to share my code for reference if you're interested 🙂

@Lucretiel
Copy link
Contributor

Hey! Excited to report that the new crates for 2.0 are published and ready to use. I'm working this week on an announcement and a set of tutorials and so on, but all of the new crates for independent typeshare implementations are published and documented. In particular:

  • typeshare-model: This crate has the Language trait; in a future typeshare-java crate that you write, you'd have something resembling impl<'config> Language<'config> for Java
  • typeshare-driver: This crate has typeshare_binary, a macro that creates an fn main for your language(s). You simply write typeshare_binary! { Java } and it takes care of the rest.

I also wanted to address the plugin model you proposed; thank you for the idea! We explored something very similar back in November when this project was first being workshopped. While we understood the appeal of a plugin model, our conclusion was that because the amount of maintenance burden on the language author is the same (either way they have to produce a compiled artifact with cargo build), benefits of a plugin model (auditing ease) didn't outweigh the substantial complexity added by a plugin model. Instead you just build your own standalone typeshare-java binary and use it however you like.

@Lucretiel
Copy link
Contributor

I'll also throw in some thoughts about one-type-per-file mode. Currently, typeshare has two mode: single-file (all output to a single large file) and multi-file (one output file per CRATE). Probably the sensible thing to do is rename that to file-per-crate mode, and introduce a third file-per-type mode. The logic looks largely the same in both cases, where typeshare calls begin-file and end-file for each file, and write_imports with the set of computed dependencies for each file. It would gain the following methods:

  • output_filename_for_type
  • output_dirname_for_crate
  • write_additonal_files_for_types, which mirrors write_additional_files (we might even deprecate write_additional_files)

@bcheidemann
Copy link
Author

Hey @Lucretiel, that's really exciting news! I've been working from my half baked PoC branch for a little while now so I can't wait to tidy up the code and publish it for others to use :)

Regarding the file-per-type mode, it's no longer a huge priority for me since namespacing works well enough:

public class MyCrateName {
  public record MyTypeName(...) {}
}

However, I'd love to support both namespacing and file-per-type mode in typeshare-java if possible. Would you be interested in a PR to this repo implementing file-per-type mode as you've described it above?

@Lucretiel
Copy link
Contributor

I would, but I'd ask to hold off for now. I'm primarily in the midst of taking care of all of the work actually releasing and publicizing all of this work, and really don't have the space to already be field large new feature work before the current thing has formally released. If possible I'd focus on the namespacing solution for now.

@bcheidemann
Copy link
Author

Of course 🙂 let me know when you'd be ready for that PR and I'll be happy to put the work in

@bcheidemann
Copy link
Author

bcheidemann commented Apr 24, 2025

@Lucretiel in case you're interested, I published https://crates.io/crates/typeshare-java (excuse the lack of README - I'll fix this!). It was quite straight forward to port the code from this PR and the new typeshare_binary! macro works great!

I know it's probably too early for feedback but I found 2 limitations of the new typeshare_binary! macro:

  1. Comments from JavaConfig are not picked up as descriptions in the auto-generated CLI flag
  2. The --generate-config flag seems to be missing in the generated CLI

Once you're ready for contributions, I'd be happy to look into either or both of these :)

@Lucretiel
Copy link
Contributor

  • generate-config is a known regression, I'll restore that functionality as soon as possible. More than happy to accept a contribution here; there's a large pile of commented code in typeshare-engine that's relevant to this task.
  • Keeping the doc comments in the CLI --help output is a good catch. I'm actually not totally sure how I'd go about fixing that, since currently the way the CLI args work is that we use a custom serde serializer to detect the config fields' types and names and thereby populate the clap::Parser object. Probably some kind of different, from-scratch solution would be needed to resolve this. Perhaps there's a crate out there with a derive macro that makes rustdoc content available at runtime?

@bcheidemann
Copy link
Author

Hey @Lucretiel, I've been thinking about the doc comments. Although I did find some crates which implement reflection via macros, I wasn't able to find any which expose doc comments at runtime. Unless I missed something, I think you're right that a from-scratch solution may be needed (unless ... see point 3). Here's some thoughts on how that could work:

1. #[derive(Config)]

It would be fairly straight forward to implement a derive macro to replace the compute_args_set function. This could be used like this:

#[derive(Config, Deserialize, Serialize)]
#[serde(default)]
pub struct JavaConfig {
    /// Name of the Java package
    pub package: Option<String>,
}

This could generate the following code:

#[doc(hidden)]
const _: () = {
    const CLI_ARGS: [(&'static str, CliArg); 1] = [(
        "package",
        CliArg {
            full_key: "java-package",
            ty: ArgType::Bool,
            rustdoc: "Name of the Java package",
        },
    )];

    #[automatically_derived]
    impl Config<'_> for ExampleConfig {
        fn argset() -> CliArgsSet {
            CliArgsSet {
                args: HashMap::from(CLI_ARGS),
            }
        }
    }
};

This also allows getting rid of some (admittedly very clever) "crimes against serde" 😉

    let empty_config = L::Config::deserialize(EmptyDeserializer).context(
        "failed to create empty config; \
        did you forget `#[serde(default)]`?",
    )?;

    let args_set = empty_config
        .serialize(ArgsSetSerializer::new(L::NAME))
        .context("failed to compute CLI arguments from language configuration type")?;

2. #[config] proc macro

We can take this a step further if we give up on using a derive macro.

Currently, developers need to implement serde Serialize and Deserialize, plus remember to add #[serde(default)]. These are all implementation details of typeshare which developers implementing language support don't necessarily care about. They also maybe subject to change in future versions of typeshare.

Perhaps the config code could be simplified by replacing the derive macros with a single proc macro. This could be used as follows:

#[config]
pub struct JavaConfig {
    /// Name of the Java package
    pub package: Option<String>,
}

And would generate:

#[derive(Deserialize, Serialize)]
#[serde(default)]
pub struct JavaConfig {
    /// Name of the Java package
    pub package: Option<String>,
}

#[doc(hidden)]
const _: () = {
    const CLI_ARGS: [(&'static str, CliArg); 1] = [(
        "package",
        CliArg {
            full_key: "java-package",
            ty: ArgType::Bool,
            rustdoc: "Name of the Java package",
        },
    )];

    #[automatically_derived]
    impl Config<'_> for ExampleConfig {
        fn argset() -> CliArgsSet {
            CliArgsSet {
                args: HashMap::from(CLI_ARGS),
            }
        }
    }
};

3. Give language crates the full power of Clap

I can imagine scenarios where developers of language crates may want more control over argument parsing. For example, to implement argument groups or mutually exclusive arguments. To avoid these becoming feature requests for typeshare, you could just give developers full control over argument parsing using Clap - this also solves the doc comments.

It could look something like this:

#[derive(Parser)]
struct JavaArgs {
    /// Name of the Java package
    #[arg(short, long)]
    pub package: Option<String>,
}

Clap has several ways to "combine" multiple sets of arguments:

let mut command = JavaArgs::command();
command = StandardArgs::augment_args(command);

// Or

let mut command = StandardArgs::command();
let java_command = JavaArgs::command();
command = command.args(java_command.get_arguments());
command = command.subcommands(java_command.get_subcommands());

// Or...

#[derive(Parser)]
#[command(version, about, long_about = None)]
struct StandardArgs {
  // ...
  
  #[command(flatten)]
  extra: JavaArgs,
}

The last one would require significant changes, but I think the first one would be quite straight forward.

As for handling config...

Either, JavaArgs can derive Serialize + Deserialize (but I'm not sure how to handle values from config overriding default values), or argument parsing and config could be treated as separate concerns:

pub trait Language<'config> {
    type Args: Parser;
    type Config: Serialize + Deserialize<'config> + From<Self::Args>; // Note the From impl which ensures config can be generated automatically from args

    // ...
}

@bcheidemann
Copy link
Author

bcheidemann commented Jun 5, 2025

I wasn't able to find any [crates] which expose doc comments at runtime

Since posting the above comment, I have become aware of facet:

use facet::Facet;

#[derive(Facet)]
struct FooBar {
    /// Hello world
    test_field: u32,
}

fn main() {
    let facet::Type::User(facet::UserType::Struct(ty)) = FooBar::SHAPE.ty else {
        unreachable!()
    };

    println!("{:?}", ty.fields[0].doc);
}
$ cargo run
[" Hello world"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants