Skip to content

Conversation

@gkatzioura
Copy link
Contributor

@gkatzioura gkatzioura commented Nov 6, 2025

Purpose

Linked issue: close #1311

Add File System support for Azure Blob Storage.

Brief change log

Tests

API and Format

Documentation

@gkatzioura gkatzioura marked this pull request as draft November 6, 2025 08:46
@polyzos
Copy link
Contributor

polyzos commented Nov 6, 2025

@gkatzioura, you mean -> Add File System support for Azure Blob Storage?

@polyzos polyzos changed the title 1311 Azure Blob Storage Support 1311 Nov 6, 2025
@gkatzioura
Copy link
Contributor Author

@gkatzioura, you mean -> Add File System support for Azure Blob Storage?

Fixed

@luoyuxia
Copy link
Contributor

luoyuxia commented Nov 6, 2025

Just as #993 , obtainSecurityToken method is required for client to access Azure Blob Storage when read from remote

@gkatzioura
Copy link
Contributor Author

Just as #993 , obtainSecurityToken method is required for client to access Azure Blob Storage when read from remote

checking the options

@gkatzioura gkatzioura marked this pull request as ready for review December 1, 2025 07:40
@polyzos
Copy link
Contributor

polyzos commented Dec 1, 2025

@gkatzioura, we should make sure it works for all prefixes, like
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/azure/#azure-blob-storage

@polyzos
Copy link
Contributor

polyzos commented Dec 3, 2025

abfs:// works fine during my tests.

Screenshot 2025-12-03 at 11 28 12 AM

I pushed one commit to disable tests, because we can't meet the test coverage here, and I'm figuring out the issue with the license.

@gkatzioura
Copy link
Contributor Author

gkatzioura commented Dec 3, 2025

abfs:// works fine during my tests.

Screenshot 2025-12-03 at 11 28 12 AM I pushed one commit to disable tests, because we can't meet the test coverage here, and I'm figuring out the issue with the license.

Yes forgot to mention that I added the support for all formats

@polyzos
Copy link
Contributor

polyzos commented Dec 4, 2025

LicenseChecker fixed, and I just finished testing with all schemes. Everything works great 💪 🚀
@luoyuxia @michaelkoepf Let me know if you have any comments on this PR, before merging

@gkatzioura
Copy link
Contributor Author

Hi @luoyuxia
some observations on the obtainSecurityToken functionality.

I do receive the tokens when I read fluss data as a client

try (LogScanner logScanner = table.newScan().createLogScanner()) {
            logScanner.subscribeFromBeginning(1);
            var record = logScanner.poll(Duration.ofSeconds(10));
            record.forEach(r -> _ );
}

Once log scanner is started the tokens start to be sent.
However I could not see the tokens being shared above fluss components (tablet-server coordinator server).
Similar to what happens with Flink JobManager tokens delegated to the TaskManager.
My assumption was that the token is shared among the fluss components then it is used with a
hadoop custom provider type (Depends on the hadoop driver).

@luoyuxia
Copy link
Contributor

Hi @luoyuxia some observations on the obtainSecurityToken functionality.

I do receive the tokens when I read fluss data as a client

try (LogScanner logScanner = table.newScan().createLogScanner()) {
            logScanner.subscribeFromBeginning(1);
            var record = logScanner.poll(Duration.ofSeconds(10));
            record.forEach(r -> _ );
}

Once log scanner is started the tokens start to be sent. However I could not see the tokens being shared above fluss components (tablet-server coordinator server). Similar to what happens with Flink JobManager tokens delegated to the TaskManager. My assumption was that the token is shared among the fluss components then it is used with a hadoop custom provider type (Depends on the hadoop driver).

Hi, the security token is not shared above fluss components. Currently, it's only used by fluss-client, fluss-client require a token to read from remote directly.

Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gkatzioura Thanks for the greate work! Left minor comment

@gkatzioura gkatzioura requested a review from luoyuxia January 6, 2026 08:32
@polyzos
Copy link
Contributor

polyzos commented Jan 12, 2026

@gkatzioura is there anything else pending here?

@gkatzioura gkatzioura marked this pull request as draft January 14, 2026 07:50
@gkatzioura gkatzioura marked this pull request as ready for review January 19, 2026 08:34
@gkatzioura
Copy link
Contributor Author

@polyzos & @luoyuxia @wuchong fyi this is ready for the release

Copy link
Contributor

@leekeiabstraction leekeiabstraction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! Hope you don't mind me reviewing, left a few comments. PTAL!

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gkatzioura Thanks for the contribution! The implementation looks good to me. I only have some concerns about the naming conventions used in this PR:

1. Module/Artifact Name

The current fluss-fs-abfs uses a specific scheme name ("abfs") rather than the storage service name. Since this module supports multiple schemes (abfs, abfss, wasb, wasbs), the module should be named after the storage service: fluss-fs-azure. azure is more friendly than abfs for users when picking up the filesystem jars. Flink and Hadoop also use the same naming for the modules.

2. Package Name

Same issue applies to the package org.apache.fluss.fs.abfs. It should be: org.apache.fluss.fs.azure.

3. Class Names

It's weird that WasbFileSystemPlugin extends AbfsFileSystemPlugin. We should extract the common logic of AbfsFileSystemPlugin into an abstract class AzureFileSystemPlugin. And making specific azure scheme filesystem plugins extends the AzureFileSystemPlugin. Adding more detailed comment for the abstract class

/**
 * Abstract factory for creating Azure Blob Storage file systems. Supports multiple URI schemes
 * (abfs, abfss, wasb, wasbs) based on Azure HDFS support in the hadoop-azure module.
 */
abstract class AzureFileSystemPlugin implements FileSystemPlugin {

Comment on lines +52 to +73
This project bundles the following dependencies under the BSD license. See bundled license files for details.

- dnsjava:dnsjava:3.4.0
- org.codehaus.woodstox:stax2-api:4.2.1

This project bundles the following dependencies under the Go License (https://golang.org/LICENSE). See bundled license files for details.

- com.google.re2j:re2j:1.1

This project bundles the following dependencies under dual license CDDL 1.1 / GPLv2 with Classpath Exception.
- CDDL 1.1: https://oss.oracle.com/licenses/CDDL-1.1
- GPLv2 + CPE: https://openjdk.org/legal/gplv2+ce.html

- javax.xml.bind:jaxb-api:2.2.2
- javax.xml.bind:jaxb-api:2.3.1
- javax.activation:activation:1.1
- com.sun.xml.bind:jaxb-impl:2.2.3-1
- com.github.pjfanning:jersey-json:1.20

This project bundles the following dependencies under the Eclipse Distribution License (EDL) 1.0 (https://www.eclipse.org/org/documents/edl-v10.php). See bundled license files for details.

- jakarta.activation:jakarta.activation-api:1.2.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces many Category B dependencies. According to Apache Software Foundation policy, all Category B dependencies require a corresponding LICENSE-xxx notice in the resource folder. These dependencies are likely pulled in because of a direct dependency on hadoop-common.

To avoid this, please try replacing the direct hadoop-common dependency with fluss-fs-hadoop-shaded. This shaded artifact should encapsulate those transitive dependencies and may eliminate the need to declare them explicitly in the NOTICE file.

@wuchong
Copy link
Member

wuchong commented Jan 25, 2026

Also, since the branch is far behind main, could you please rebase it onto the latest main branch? This will make the CI results more reliable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[file system] Add Azure Blob Storage

6 participants