CASSSIDECAR-226 Adding endpoint for verifying files post data copy during live migration by nvharikrishna · Pull Request #309 · apache/cassandra-sidecar

nvharikrishna · 2026-01-25T11:32:00Z

CASSSIDECAR-226 Adding an endpoint for verifying files between source and destination post data copy.

This implementation uses a two-task approach (data copy + file verification) rather than inline digest verification during data copy (as originally proposed in CEP-40). This design choice is motivated by:

Performance Efficiency: The data copy task executes multiple iterations internally. Even with successThreshold=1.0, the task might require at least two internal iterations (iteration 0: download → DOWNLOAD_COMPLETE, iteration 1: verify threshold → SUCCESS). Inline digest verification would calculate digests twice per file (once in each iteration), doubling the I/O cost. With separate tasks, digests are calculated once after data stabilizes.
Code Simplicity: Separating digest verification from file copying provides clear separation of concerns, making each task easier to understand, test, and maintain.
Operational Flexibility: Users can run verification independently, repeat it if needed, or skip it for non-critical migrations. Inline verification would make this mandatory overhead.

Here are the endpoint details:

Sample files verification task submission request:

curl -X POST http://dest-host.example.com:9043/api/v1/live-migration/files-verification-tasks \
  -H "Content-Type: application/json" \
  -d '{
    "maxConcurrency": 10,
    "digestAlgorithm": "MD5"
  }'

It supports XXHash32 algorithm too and seed as additional input in the payload.

Sample response:

{
  "taskId": "b8e4f3d2-5c6b-5d9e-0f2g-3b4c5d6e7f8g",
  "statusUrl": "/api/v1/live-migration/files-verification-tasks/b8e4f3d2-5c6b-5d9e-0f2g-3b4c5d6e7f8g"
}

Fetching files verification task status

curl -X GET http://dest-host.example.com:9043/api/v1/live-migration/files-verification-tasks/b8e4f3d2-5c6b-5d9e-0f2g-3b4c5d6e7f8g

Sample response:

{
  "id": "b8e4f3d2-5c6b-5d9e-0f2g-3b4c5d6e7f8g ",
  "digestAlgorithm": "md5",
  "seed": null,
  "state": "COMPLETED",
  "source": "localhost1",
  "port": 9043,
  "filesNotFoundAtSource": 0,
  "filesNotFoundAtDestination": 0,
  "metadataMatched": 379,
  "metadataMismatches": 0,
  "digestMismatches": 0,
  "digestVerificationFailures": 0,
  "filesMatched": 323
}

Also made additional changes to ensure that either data copy task or file verification task can be executed at any point of time.

yifan-c · 2026-02-15T00:38:35Z

...rc/main/java/org/apache/cassandra/sidecar/common/request/LiveMigrationFileDigestRequest.java

+        String fullURI = seed != null
+                         ? String.format("%s?%s=%s&%s=%d", requestURI, DIGEST_ALGORITHM_PARAM, digestAlgorithm, SEED_PARAM, seed)


One learning from the RestoreJob work is that the custom seed does not provide benefit for data integrity validation, but only adds code complexity. I would just drop the support of custom seed support to simplify the implementation, and use the fixed seed 0, which also makes the client-server communication simpler.
Not strong on removing the seed support, but feel ideal to do so.

I agree with the code complexity and simplifying code suggestions. I can remove the support for seed for live migration.

…ring live migration

yifan-c · 2026-02-24T06:30:44Z

...assandra/sidecar/handlers/livemigration/LiveMigrationCreateFilesVerificationTaskHandler.java

+            if (request.maxConcurrency() > liveMigrationConfiguration.maxConcurrentFileRequests())
+            {
+                throw new IllegalArgumentException("Invalid maxConcurrency " + request.maxConcurrency() +
+                                                   ". It cannot be greater than " +
+                                                   liveMigrationConfiguration.maxConcurrentFileRequests());
+            }


In FilesVerificationTaskManager, it handles maxConcurrency differently. Can you address the inconsistency or the duplication? There seems to be sufficient to have one validation only.

if (request.maxConcurrency() > maxPossibleConcurrency) { return Future.failedFuture( new LiveMigrationInvalidRequestException("max concurrency can not be more than " + maxPossibleConcurrency)); }

yifan-c · 2026-03-01T07:00:08Z

...assandra/sidecar/handlers/livemigration/LiveMigrationCreateFilesVerificationTaskHandler.java

+                            {
+                                LOGGER.error("Cannot start a new files verification task for host {} " +
+                                             "while another live migration task is in progress.", host);
+                                context.fail(wrapHttpException(FORBIDDEN, throwable.getMessage(), throwable));


Should the status code be 409 Conflict, instead of Forbidden?
Forbidden typically means no permission to perform an action, not the cause here.

yifan-c · 2026-03-01T07:12:35Z

...r/src/main/java/org/apache/cassandra/sidecar/livemigration/FilesVerificationTaskManager.java

+ * executed asynchronously to validate file integrity between source and destination nodes.
+ */
+@Singleton
+public class FilesVerificationTaskManager


Should FilesVerificationTaskManager and DataCopyTaskManager have a common base class? There are several almost identical methods, e.g. getAllTasks(), getTask() and cancelTask().

yifan-c · 2026-03-01T07:16:15Z

server/src/main/java/org/apache/cassandra/sidecar/livemigration/LiveMigrationTaskManager.java

+        {
+            return Collections.emptyList();
+        }
+        return Collections.singletonList(currentTasks.get(localInstance.id()));


localInstance.id() could potentially be removed at this step due to race condition. Instead, let's get the value at line#99 and return based on whether value is null or not.

yifan-c · 2026-03-01T07:19:36Z

...main/java/org/apache/cassandra/sidecar/livemigration/LiveMigrationFilesVerificationTask.java

+                                                      logPrefix, source, port, cause));
+    }
+
+    private @NotNull Future<List<InstanceFileInfo>> compareFilesMeta(List<InstanceFileInfo> localFiles,


nit: prefer compareFilesMetadata

yifan-c · 2026-03-01T07:21:16Z

...main/java/org/apache/cassandra/sidecar/livemigration/LiveMigrationFilesVerificationTask.java

+ * <p>The verification process consists of three stages:
+ * <ol>
+ *   <li>Fetch file lists from both source and destination instances concurrently</li>
+ *   <li>Compare file metadata (size, type, modification time) - fails fast on mismatches</li>


I do not think compareFilesMeta() (again, prefer compareFilesMetadata()) fail fast. It loops through all files and collects all the errors. Please update the java doc to reflect the actual implementation.

yifan-c · 2026-03-01T07:22:55Z

...main/java/org/apache/cassandra/sidecar/livemigration/LiveMigrationFilesVerificationTask.java

+        if (digestAlgorithm.equalsIgnoreCase(MD5Digest.MD5_ALGORITHM))
+        {
+            return Future.succeededFuture(new MD5Digest(digestResponse.digest));
+        }
+        else if (digestAlgorithm.equalsIgnoreCase(XXHash32Digest.XXHASH_32_ALGORITHM))
+        {
+            return Future.succeededFuture(new XXHash32Digest(digestResponse.digest));
+        }


Should it be in DigestAlgorithmFactory?

yifan-c · 2026-03-01T07:29:24Z

...r/src/main/java/org/apache/cassandra/sidecar/livemigration/FilesVerificationTaskManager.java

+    private Future<LiveMigrationTask<LiveMigrationFilesVerificationResponse>> createVerifier(LiveMigrationFilesVerificationRequest request,
+                                                                                             String source,
+                                                                                             InstanceMetadata localInstanceMetadata)
+    {
+        String timeUuid = UUIDs.timeBased().toString();
+        return Future.succeededFuture(taskFactory.create(timeUuid,
+                                                         source,
+                                                         sidecarConfiguration.serviceConfiguration().port(),
+                                                         request,
+                                                         localInstanceMetadata));
+    }


It is a private method and synchronous. The only reason is to chain compose. I think you can avoid wrapping Future, and just return LiveMigrationTask.

yifan-c · 2026-03-01T07:29:54Z

...r/src/main/java/org/apache/cassandra/sidecar/livemigration/FilesVerificationTaskManager.java

+                   else
+                   {
+                       return Future.failedFuture(new LiveMigrationTaskInProgressException(
+                       "Another files digests verification is in progress for instance=" + localInstanceMetadata.id()));


Suggested change

"Another files digests verification is in progress for instance=" + localInstanceMetadata.id()));

"Another files digest verification is in progress for instance=" + localInstanceMetadata.id()));

yifan-c · 2026-03-01T07:53:10Z

...main/java/org/apache/cassandra/sidecar/livemigration/LiveMigrationFilesVerificationTask.java

+    Future<String> verifyDigest(InstanceFileInfo fileInfo)
+    {
+        return getSourceFileDigest(fileInfo)
+               .compose(digest -> {
+                   String path = localPath(fileInfo.fileUrl, instanceMetadata).toAbsolutePath().toString();
+                   return digestVerifierFactory.verifier(MultiMap.caseInsensitiveMultiMap().addAll(digest.headers()))
+                                               .verify(path)
+                                               .compose(verified -> Future.succeededFuture(path))
+                                               .recover(cause -> Future.failedFuture(
+                                               new DigestMismatchException(path, fileInfo.fileUrl, cause)));
+               })
+               .onSuccess(filePath -> LOGGER.debug("{} Verified file {}", logPrefix, fileInfo.fileUrl))
+               .onFailure(cause -> LOGGER.error("{} Failed to verify file {}", logPrefix, fileInfo.fileUrl, cause));
+    }
+
+    private Future<Digest> getSourceFileDigest(InstanceFileInfo fileInfo)
+    {
+        return Future.fromCompletionStage(sidecarClient.liveMigrationFileDigestAsync(new SidecarInstanceImpl(source, port),
+                                                                                     fileInfo.fileUrl,
+                                                                                     request.digestAlgorithm()))
+                     .compose(this::toDigest);
+    }


It makes 1 http request to source per file to get the digest. According to LiveMigrationConcurrencyLimitHandler, TOO_MANY_REQUESTS can be thrown. There is no retry implemented to handle it, due to SingleInstanceSelectionPolicy + default retry policy. I think you want to add custom retry policy for the applicable requests.

Beside no retry and fail silently, 1 request per file seems to ensure slowness already. Maybe we should revisit the design decision later.

yifan-c reviewed Feb 15, 2026

View reviewed changes

nvharikrishna added 3 commits February 20, 2026 22:47

CASSSIDECAR-226 Adding endpoint for verifying files post data copy du…

3769ae3

…ring live migration

Added entry into CHANGES.txt

2d7964a

Removed support for seed for live migration files verification task

39cb915

nvharikrishna force-pushed the 226-lm-file-digests-trunk branch from 9e9da17 to 39cb915 Compare February 20, 2026 19:16

yifan-c reviewed Mar 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CASSSIDECAR-226 Adding endpoint for verifying files post data copy during live migration#309

CASSSIDECAR-226 Adding endpoint for verifying files post data copy during live migration#309
nvharikrishna wants to merge 3 commits intoapache:trunkfrom
nvharikrishna:226-lm-file-digests-trunk

nvharikrishna commented Jan 25, 2026 •

edited

Loading

Uh oh!

yifan-c Feb 15, 2026 •

edited

Loading

Uh oh!

nvharikrishna Feb 16, 2026

Uh oh!

yifan-c Feb 24, 2026

Uh oh!

yifan-c Mar 1, 2026

Uh oh!

yifan-c Mar 1, 2026

Uh oh!

yifan-c Mar 1, 2026

Uh oh!

yifan-c Mar 1, 2026

Uh oh!

yifan-c Mar 1, 2026

Uh oh!

yifan-c Mar 1, 2026

Uh oh!

yifan-c Mar 1, 2026

Uh oh!

yifan-c Mar 1, 2026

Uh oh!

yifan-c Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		String fullURI = seed != null
		? String.format("%s?%s=%s&%s=%d", requestURI, DIGEST_ALGORITHM_PARAM, digestAlgorithm, SEED_PARAM, seed)

	"Another files digests verification is in progress for instance=" + localInstanceMetadata.id()));
	"Another files digest verification is in progress for instance=" + localInstanceMetadata.id()));

Conversation

nvharikrishna commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yifan-c Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nvharikrishna commented Jan 25, 2026 •

edited

Loading

yifan-c Feb 15, 2026 •

edited

Loading