Describe the usage question you have. Please include as many useful details as possible.
I have ~ 20KB objects that I need to write to Parquet efficiently from Java.
In C++, C#, and Python there's a direct/bulk Arrow-Parquet write (e.g. WriteTable / write_table) that avoids row-by-row iteration, but in Java I only see row-by-row paths via RecordConsumer or internal/unstable column writers.
Questions:
- Is there a supported bulk/columnar Arrow-Parquet write API in Java (e.g, VectorSchemaRoot
→ Parquet) that avoids row-by-row calls?
- If not, why is Java limited to row-by-row writes today? Any roadmap for feature parity with C++/Python/C#?
- For now, what's the recommended optimization path to write 20KB objects at high throughput from Java (without JNI), or is JNI/Dataset the recommended route?
- Any best practices (batch sizing, encodings, writer settings) to mitigate the row-by-row overhead?
Component(s)
Java