POC: Add language java #239

bcheidemann · 2025-03-17T19:07:52Z

This is a POC to accompany RFC #238

Important

This PR is not up to date with the RFC.

Lucretiel · 2025-03-28T22:04:33Z

Definitely keep this draft open so that the source can stay available, but be aware that in an effort to reduce the maintenance burden on the typeshare owners, we're pausing accepting PRs for new languages while we work on a major refactor (being worked on in the typeshare2 branch) that will allow language implementations to exist as separate crates that can be owned and maintained independently from this repo.

bcheidemann · 2025-03-29T14:03:04Z

Definitely keep this draft open so that the source can stay available, but be aware that in an effort to reduce the maintenance burden on the typeshare owners, we're pausing accepting PRs for new languages while we work on a major refactor (being worked on in the typeshare2 branch) that will allow language implementations to exist as separate crates that can be owned and maintained independently from this repo.

@Lucretiel thanks for the info! This is very exciting 🙂

Coincidentally, I actually did some work along the same lines on my local. My motivation for this exploratory work was three-fold:

Reduced burden on the typeshare owners (as you mentioned)
Allowing 3rd parties to quickly add the languages they need
Finding a way to run untrusted 3rd party languages securely

For this reason, my POC focused on implementing 3rd party language plugins using WASM. It ended up looking something like this:

struct MyLanguagePlugin;

#[typeshare::plugin]
impl typeshare::Language for MyLanugagePlugin {
  // ...
}

Which generates extern functions for __alloc, __dealloc, and for each of the functions in the Language trait.

My vision for this was that you could then run:

typeshare --lang https://.../my_language.wasm ...
typeshare --lang file:///path/to/my_language.wasm ...

The main advantage of executing WASM, rather than native code, is that it's sandboxed. This means consumers only need to audit the code which the plugin outputs, rather than the source code of the plugin.

If you'd like, I'd be happy to share my code for reference if you're interested 🙂

Lucretiel · 2025-04-22T20:02:58Z

Hey! Excited to report that the new crates for 2.0 are published and ready to use. I'm working this week on an announcement and a set of tutorials and so on, but all of the new crates for independent typeshare implementations are published and documented. In particular:

typeshare-model: This crate has the Language trait; in a future typeshare-java crate that you write, you'd have something resembling impl<'config> Language<'config> for Java
typeshare-driver: This crate has typeshare_binary, a macro that creates an fn main for your language(s). You simply write typeshare_binary! { Java } and it takes care of the rest.

I also wanted to address the plugin model you proposed; thank you for the idea! We explored something very similar back in November when this project was first being workshopped. While we understood the appeal of a plugin model, our conclusion was that because the amount of maintenance burden on the language author is the same (either way they have to produce a compiled artifact with cargo build), benefits of a plugin model (auditing ease) didn't outweigh the substantial complexity added by a plugin model. Instead you just build your own standalone typeshare-java binary and use it however you like.

Lucretiel · 2025-04-22T23:43:39Z

I'll also throw in some thoughts about one-type-per-file mode. Currently, typeshare has two mode: single-file (all output to a single large file) and multi-file (one output file per CRATE). Probably the sensible thing to do is rename that to file-per-crate mode, and introduce a third file-per-type mode. The logic looks largely the same in both cases, where typeshare calls begin-file and end-file for each file, and write_imports with the set of computed dependencies for each file. It would gain the following methods:

output_filename_for_type
output_dirname_for_crate
write_additonal_files_for_types, which mirrors write_additional_files (we might even deprecate write_additional_files)

bcheidemann · 2025-04-22T23:48:29Z

Hey @Lucretiel, that's really exciting news! I've been working from my half baked PoC branch for a little while now so I can't wait to tidy up the code and publish it for others to use :)

Regarding the file-per-type mode, it's no longer a huge priority for me since namespacing works well enough:

public class MyCrateName {
  public record MyTypeName(...) {}
}

However, I'd love to support both namespacing and file-per-type mode in typeshare-java if possible. Would you be interested in a PR to this repo implementing file-per-type mode as you've described it above?

Lucretiel · 2025-04-22T23:50:20Z

I would, but I'd ask to hold off for now. I'm primarily in the midst of taking care of all of the work actually releasing and publicizing all of this work, and really don't have the space to already be field large new feature work before the current thing has formally released. If possible I'd focus on the namespacing solution for now.

bcheidemann · 2025-04-23T08:03:12Z

Of course 🙂 let me know when you'd be ready for that PR and I'll be happy to put the work in

bcheidemann · 2025-04-24T23:16:33Z

@Lucretiel in case you're interested, I published https://crates.io/crates/typeshare-java (excuse the lack of README - I'll fix this!). It was quite straight forward to port the code from this PR and the new typeshare_binary! macro works great!

I know it's probably too early for feedback but I found 2 limitations of the new typeshare_binary! macro:

Comments from JavaConfig are not picked up as descriptions in the auto-generated CLI flag
The --generate-config flag seems to be missing in the generated CLI

Once you're ready for contributions, I'd be happy to look into either or both of these :)

Lucretiel · 2025-04-25T00:02:27Z

generate-config is a known regression, I'll restore that functionality as soon as possible. More than happy to accept a contribution here; there's a large pile of commented code in typeshare-engine that's relevant to this task.
Keeping the doc comments in the CLI --help output is a good catch. I'm actually not totally sure how I'd go about fixing that, since currently the way the CLI args work is that we use a custom serde serializer to detect the config fields' types and names and thereby populate the clap::Parser object. Probably some kind of different, from-scratch solution would be needed to resolve this. Perhaps there's a crate out there with a derive macro that makes rustdoc content available at runtime?

bcheidemann · 2025-04-26T13:24:22Z

Hey @Lucretiel, I've been thinking about the doc comments. Although I did find some crates which implement reflection via macros, I wasn't able to find any which expose doc comments at runtime. Unless I missed something, I think you're right that a from-scratch solution may be needed (unless ... see point 3). Here's some thoughts on how that could work:

1. `#[derive(Config)]`

It would be fairly straight forward to implement a derive macro to replace the compute_args_set function. This could be used like this:

#[derive(Config, Deserialize, Serialize)]
#[serde(default)]
pub struct JavaConfig {
    /// Name of the Java package
    pub package: Option<String>,
}

This could generate the following code:

#[doc(hidden)]
const _: () = {
    const CLI_ARGS: [(&'static str, CliArg); 1] = [(
        "package",
        CliArg {
            full_key: "java-package",
            ty: ArgType::Bool,
            rustdoc: "Name of the Java package",
        },
    )];

    #[automatically_derived]
    impl Config<'_> for ExampleConfig {
        fn argset() -> CliArgsSet {
            CliArgsSet {
                args: HashMap::from(CLI_ARGS),
            }
        }
    }
};

This also allows getting rid of some (admittedly very clever) "crimes against serde" 😉

    let empty_config = L::Config::deserialize(EmptyDeserializer).context(
        "failed to create empty config; \
        did you forget `#[serde(default)]`?",
    )?;

    let args_set = empty_config
        .serialize(ArgsSetSerializer::new(L::NAME))
        .context("failed to compute CLI arguments from language configuration type")?;

2. `#[config]` proc macro

We can take this a step further if we give up on using a derive macro.

Currently, developers need to implement serde Serialize and Deserialize, plus remember to add #[serde(default)]. These are all implementation details of typeshare which developers implementing language support don't necessarily care about. They also maybe subject to change in future versions of typeshare.

Perhaps the config code could be simplified by replacing the derive macros with a single proc macro. This could be used as follows:

#[config]
pub struct JavaConfig {
    /// Name of the Java package
    pub package: Option<String>,
}

And would generate:

#[derive(Deserialize, Serialize)]
#[serde(default)]
pub struct JavaConfig {
    /// Name of the Java package
    pub package: Option<String>,
}

#[doc(hidden)]
const _: () = {
    const CLI_ARGS: [(&'static str, CliArg); 1] = [(
        "package",
        CliArg {
            full_key: "java-package",
            ty: ArgType::Bool,
            rustdoc: "Name of the Java package",
        },
    )];

    #[automatically_derived]
    impl Config<'_> for ExampleConfig {
        fn argset() -> CliArgsSet {
            CliArgsSet {
                args: HashMap::from(CLI_ARGS),
            }
        }
    }
};

3. Give language crates the full power of Clap

I can imagine scenarios where developers of language crates may want more control over argument parsing. For example, to implement argument groups or mutually exclusive arguments. To avoid these becoming feature requests for typeshare, you could just give developers full control over argument parsing using Clap - this also solves the doc comments.

It could look something like this:

#[derive(Parser)]
struct JavaArgs {
    /// Name of the Java package
    #[arg(short, long)]
    pub package: Option<String>,
}

Clap has several ways to "combine" multiple sets of arguments:

let mut command = JavaArgs::command();
command = StandardArgs::augment_args(command);

// Or

let mut command = StandardArgs::command();
let java_command = JavaArgs::command();
command = command.args(java_command.get_arguments());
command = command.subcommands(java_command.get_subcommands());

// Or...

#[derive(Parser)]
#[command(version, about, long_about = None)]
struct StandardArgs {
  // ...
  
  #[command(flatten)]
  extra: JavaArgs,
}

The last one would require significant changes, but I think the first one would be quite straight forward.

As for handling config...

Either, JavaArgs can derive Serialize + Deserialize (but I'm not sure how to handle values from config overriding default values), or argument parsing and config could be treated as separate concerns:

pub trait Language<'config> {
    type Args: Parser;
    type Config: Serialize + Deserialize<'config> + From<Self::Args>; // Note the From impl which ensures config can be generated automatically from args

    // ...
}

bcheidemann · 2025-06-05T20:10:02Z

I wasn't able to find any [crates] which expose doc comments at runtime

Since posting the above comment, I have become aware of facet:

use facet::Facet;

#[derive(Facet)]
struct FooBar {
    /// Hello world
    test_field: u32,
}

fn main() {
    let facet::Type::User(facet::UserType::Struct(ty)) = FooBar::SHAPE.ty else {
        unreachable!()
    };

    println!("{:?}", ty.fields[0].doc);
}

$ cargo run
[" Hello world"]

bcheidemann added 2 commits March 16, 2025 19:46

feat: initial implementation for java language support

a1026ac

feat: hook up command line flags

ffbafc2

bcheidemann marked this pull request as draft March 17, 2025 19:07

bcheidemann changed the title ~~Feat/add language java~~ POC: Add language java Mar 17, 2025

bcheidemann mentioned this pull request Mar 17, 2025

docs: Java RFC #238

Draft

use namespace class

df792f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

POC: Add language java #239

POC: Add language java #239

Uh oh!

bcheidemann commented Mar 17, 2025 •

edited

Loading

Uh oh!

Lucretiel commented Mar 28, 2025

Uh oh!

bcheidemann commented Mar 29, 2025

Uh oh!

Lucretiel commented Apr 22, 2025

Uh oh!

Lucretiel commented Apr 22, 2025

Uh oh!

bcheidemann commented Apr 22, 2025

Uh oh!

Lucretiel commented Apr 22, 2025

Uh oh!

bcheidemann commented Apr 23, 2025

Uh oh!

bcheidemann commented Apr 24, 2025 •

edited

Loading

Uh oh!

Lucretiel commented Apr 25, 2025

Uh oh!

bcheidemann commented Apr 26, 2025

Uh oh!

bcheidemann commented Jun 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

POC: Add language java #239

Are you sure you want to change the base?

POC: Add language java #239

Uh oh!

Conversation

bcheidemann commented Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lucretiel commented Mar 28, 2025

Uh oh!

bcheidemann commented Mar 29, 2025

Uh oh!

Lucretiel commented Apr 22, 2025

Uh oh!

Lucretiel commented Apr 22, 2025

Uh oh!

bcheidemann commented Apr 22, 2025

Uh oh!

Lucretiel commented Apr 22, 2025

Uh oh!

bcheidemann commented Apr 23, 2025

Uh oh!

bcheidemann commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lucretiel commented Apr 25, 2025

Uh oh!

bcheidemann commented Apr 26, 2025

1. #[derive(Config)]

2. #[config] proc macro

3. Give language crates the full power of Clap

Uh oh!

bcheidemann commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bcheidemann commented Mar 17, 2025 •

edited

Loading

bcheidemann commented Apr 24, 2025 •

edited

Loading

1. `#[derive(Config)]`

2. `#[config]` proc macro

bcheidemann commented Jun 5, 2025 •

edited

Loading