vSphere CSI Driver - v2.0.0 release

New Feature

  • Offline Persistent Volume expansion for Block Volume.
  • ReadWriteMany volumes using vSAN file services.
  • Support for enabling leader election to ensure High Availability for vSphere CSI driver.

Notable Changes

  • Support for enabling leader election to ensure High Availability for vSphere CSI driver.
  • Enhanced driver to ensure volume operations are idempotent.
  • Enhanced driver logging and debuggability with contextual log tracing.

Deployment files

Kubernetes Release

  • Minimum: 1.16
  • Maximum: 1.18

Supported sidecar containers versions

  • csi-provisioner - v1.4.0
  • csi-attacher - v2.0.0
  • csi-resizer - v0.3.0
  • livenessprob - v1.1.0
  • csi-node-driver-registrar - v1.2.0

Known Issues

vSphere CSI Driver issues

  1. Unused volumes not deleted during full sync when CSI driver v2.0.0 is used with vSphere 6.7 Update3.
    • Impact: Full sync in CSI v2.0.0 does not delete unused volumes in vSphere 6.7 Update3.
    • Workaround: If you are using vSphere 6.7 Update3 with v2.0.0 release of the driver, it is recommended to use v2.0.1 release of the driver.
  2. Volume never gets detached from the node if volume is being deleted before it is detached from Node VM.

    • Reasons volume gets into this state are
      1. CSI provisioner is issuing delete call before volume is detached.
      2. vCenter API in vSphere 67u3 is not marking volume back as Container Volume when Delete call fails while Volume is attached to the Node VM. This issue is fixed in the vSphere 7.0.
        • On vSphere 67u3, issue can be fixed by upgrading driver to v2.0.1 release.
    • Impact: When Pod and PVC are deleted together upon deletion of namespace, we have a race to delete and detach volume. Due to this, Pod remains in the terminating state and PV remains in the released state.
    • Upstream issue was tracked at: https://github.com/kubernetes/kubernetes/issues/84226
    • Workaround:
      1. Delete the Pod with force: kubectl delete pods <pod> --grace-period=0 --force
      2. Find VolumeAttachment for the volume that remained undeleted. Get Node from this VolumeAttachment.
      3. Manually detach the disk from the Node VM.
      4. Edit this VolumeAttachment and remove the finalizer. It will get deleted.
      5. Use govc to manually delete the FCD.
      6. Edit PV and remove the finalizer. It will get deleted.
  3. When the static persistent volume is re-created with the same PV name, volume is not getting registered as a container volume with vSphere.

    • Impact: attach/delete can not be performed on the such Persistent Volume.
    • Workaround: wait for 1 hour before re-creating static persistent volume using the same name.
  4. Metadata syncer container deletes the volume physically from the datastore when Persistent Volumes with Bound status and reclaim policy Delete is deleted by the user when StorageObjectInUseProtection is disabled on Kubernetes Cluster.
    • Impact: Persistent Volumes Claim goes in the lost status. Volume can not be recovered.
    • Workaround: Do not disable StorageObjectInUseProtection and attempt to delete Persistent Volume directly without deleting PVC.
  5. deployment yaml uses hostPath volume in the CSI driver deployment for unix domain socket path.
    • Impact: when the controller Pod does not have access to the file system on the node VM, driver fails to create socket file and thus does not come up.
    • Workaround: use emptydir volume instead of hostPath volume.
  6. Volume expansion might fail when it is called with pod creation simultaneously.
    • Impact: Users can resize the PVC and create a pod using that PVC simultaneously. In this case, pod creation might be completed first using the PVC with original size. Volume expansion will fail because online resize is not supported in vSphere 7.0 Update1.
    • Workaround: Wait for the PVC to reach FileVolumeResizePending condition before attaching a pod to it.
  7. Deleting PV before deleting PVC, leaves orphan volume on the datastore
    • Impact: Orphan volumes remain on the datastore, and admin needs to delete those volumes manually using govc command.
    • Upstream issue is tracked at: https://github.com/kubernetes-csi/external-provisioner/issues/546
    • Workaround:
      • No workaround. User should not attempt to delete PV which is bound to PVC. User should only delete a PV if they know that the underlying volume in the storage system is gone.
      • If user has accidentally left orphan volumes on the datastore by not following the guideline, and if user has captured the volume handles or First Class Disk IDs of deleted PVs, storage admin can help delete those volumes using govc disk.rm <volume-handle/FCD ID> command.
  8. When multiple PVCs and Pods with the same name present on the Cluster, and for any reason, Volume gets de-registered or lost from vCenter CNS Database, Syncer does not re-register volume back.
    • Impact: Volume will not re-appear on the CNS UI. If volume needs to be detached and attached to newer node, it will not happen.
    • Workaround:
      • This issue is fixed in v2.1.0 release. Please consider upgrading driver to v2.1.0

Kubernetes issues

  1. Filesystem resize is skipped if the original PVC is deleted when FilesystemResizePending condition is still on the PVC, but PV and its associated volume on the storage system are not deleted due to the Retain policy.
    • Issue: https://github.com/kubernetes/kubernetes/issues/88683
    • Impact: User may create a new PVC to statically bind to the undeleted PV. In this case, the volume on the storage system is resized but the filesystem is not resized accordingly. User may try to write to the volume whose filesystem is out of capacity.
    • Workaround: User can log into the container to manually resize the filesystem.
  2. Volume associated with a Statefulset cannot be resized
  3. Recover from volume expansion failure.

vSphere issues

  1. CNS file volume has a limitation of 8K for metadata.
    • Impact: It is quite possible that we will not be able to push all the metadata to CNS file share as we need support a max of 64 clients per file volume.
    • Workaround: None, This is vSphere limitation.

results matching ""

    No results matching ""