Linstor: fix create volume from snapshot on primary storage#13043
Linstor: fix create volume from snapshot on primary storage#13043Kukunin wants to merge 1 commit intoapache:mainfrom
Conversation
When creating a volume from a snapshot on Linstor primary storage (with lin.backup.snapshots=false), the operation fails with: "Only the following image types are currently supported: VHD, OVA, QCOW2, RAW (for PowerFlex and FiberChannel)" Root cause: the Linstor driver does not handle SNAPSHOT -> VOLUME in its canCopy()/copyAsync() methods. This causes DataMotionServiceImpl to fall through to StorageSystemDataMotionStrategy (selected because Linstor advertises STORAGE_SYSTEM_SNAPSHOT=true). That strategy's verifyFormatWithPoolType() rejects RAW format for Linstor pools, since RAW is only allowed for PowerFlex and FiberChannel. Additionally, VolumeOrchestrator.createVolumeFromSnapshot() attempts to back up the snapshot to secondary storage when the storage plugin does not advertise CAN_CREATE_TEMPLATE_FROM_SNAPSHOT. This backup fails because the snapshot only exists on Linstor primary storage. Fix: - Add CAN_CREATE_TEMPLATE_FROM_SNAPSHOT capability so the orchestrator skips the backup-to-secondary path - Add canCopySnapshotToVolumeCond() to match SNAPSHOT -> VOLUME when both are on the same Linstor primary store - Wire it into canCopy() to intercept at DataMotionServiceImpl before strategy selection, bypassing StorageSystemDataMotionStrategy - Implement copySnapshotToVolume() which delegates to the existing createResourceFromSnapshot() for native Linstor snapshot restore This follows the same pattern used by the StorPool plugin, which handles SNAPSHOT -> VOLUME directly in its driver rather than going through StorageSystemDataMotionStrategy. Tested on CloudStack 4.22 with Linstor LVM_THIN storage, creating a volume from a 1TB CNPG Postgres database snapshot. Volume creates successfully with correct path and deletes cleanly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Congratulations on your first Pull Request and welcome to the Apache CloudStack community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/cloudstack/blob/main/CONTRIBUTING.md)
|
rp-
left a comment
There was a problem hiding this comment.
I'll still need to run my testsuite against it, but I guess it will work.
| answer = new CopyCmdAnswer(volumeTO); | ||
| } catch (Exception e) { | ||
| errMsg = "Failed to create volume from snapshot: " + e.getMessage(); | ||
| logger.error(errMsg, e); |
There was a problem hiding this comment.
Here also a CloudRuntimeException should be thrown, otherwise CloudStack will not notice something went wrong.
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## main #13043 +/- ##
=============================================
- Coverage 18.01% 3.52% -14.50%
=============================================
Files 6029 464 -5565
Lines 542160 40137 -502023
Branches 66451 7555 -58896
=============================================
- Hits 97682 1415 -96267
+ Misses 433461 38534 -394927
+ Partials 11017 188 -10829
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Context
I run a private cloud using 4.22 CloudStack, with Linstor primary storage, Kubernetes, CloudStack CSI driver with additional
registry.k8s.io/sig-storage/csi-snapshotter:v8.2.1sidecar andsnapshot-controller.I wanted to duplicate PVC from kubectl, by creating a snapshot and restoring another PVC from the snapshot. The main problem is that the snapshot wanted to be copied to SecondaryStorage, which is not what I wanted. Secondary storage is slow and outside of the network, so transferring 1TB volume is long and silly. I got a chain of errors, identified those, and prepared a patch that solved my issues. I built and pushed only
cloud-plugin-storage-volume-linstor-4.22.0.0.jarto my servers, and after restarting both management / agent services, the PVC copy via snapshots worked fine. Also I modified the following cloudstack settings:Description
When creating a volume from a snapshot on Linstor primary storage (with
lin.backup.snapshots=false), the operation fails with:Root cause: The Linstor driver does not handle SNAPSHOT → VOLUME in its
canCopy()/copyAsync()methods. This causesDataMotionServiceImplto fall through toStorageSystemDataMotionStrategy(selected because Linstor advertisesSTORAGE_SYSTEM_SNAPSHOT=true). That strategy'sverifyFormatWithPoolType()rejects RAW format for Linstor pools, since RAW is only allowed for PowerFlex and FiberChannel.Additionally,
VolumeOrchestrator.createVolumeFromSnapshot()attempts to back up the snapshot to secondary storage when the storage plugin does not advertiseCAN_CREATE_TEMPLATE_FROM_SNAPSHOT. This backup fails because the snapshot only exists on Linstor primary storage.Fix:
CAN_CREATE_TEMPLATE_FROM_SNAPSHOTcapability so the orchestrator skips the backup-to-secondary pathcanCopySnapshotToVolumeCond()to match SNAPSHOT → VOLUME when both are on the same Linstor primary storecanCopy()to intercept atDataMotionServiceImplbefore strategy selection, bypassingStorageSystemDataMotionStrategyentirelycopySnapshotToVolume()which delegates to the existingcreateResourceFromSnapshot()for native Linstor snapshot restoreThis follows the same pattern used by the StorPool plugin, which handles SNAPSHOT → VOLUME directly in its driver rather than going through
StorageSystemDataMotionStrategy.Fixes: #11451
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
How Has This Been Tested?
Unit tests: 5 new tests added to
LinstorPrimaryDataStoreDriverImplTest:testGetCapabilitiesIncludesCreateTemplateFromSnapshot— verifies the capability is advertisedtestCanCopySnapshotToVolumeOnSamePrimary— verifiescanCopy()returns true for SNAPSHOT → VOLUME on same Linstor primarytestCanCopySnapshotToVolumeRejectsNonLinstor— verifiescanCopy()returns false for non-Linstor storagetestCanCopySnapshotToVolumeRejectsCrossPrimary— verifiescanCopy()returns false across different primary storestestCanCopySnapshotToVolumeRejectsImageDest— verifiescanCopy()returns false when destination is Image storeIntegration test: Tested on CloudStack 4.22 with Linstor LVM_THIN storage (DRBD-replicated across 3 nodes), creating a volume from a 1TB CNPG Postgres database snapshot via
createVolumeAPI:resourceSnapshotRestoreAPI)How did you try to break this feature and the system with this change?
canCopy()paths (SNAPSHOT→SNAPSHOT to Image, TEMPLATE→TEMPLATE, VOLUME→VOLUME/TEMPLATE) are not affected by the new condition being checked first