I want to troubleshoot common SnapMirror issues with Amazon FSx for NetApp ONTAP.
Resolution
SnapMirror shows a failed to query transfer status
SnapMirror returns this error:
"Failed to query transfer status. (Destination is in an invalid transfer state (Replication engine error))"
This error occurs because the SnapMirror checkpoint isn't in sync with the file system. To resolve this error, follow these steps:
-
Delete the SnapMirror relationship with the destination volume:
FsxIdxxxxxxx::> snapmirror delete -destination-path destsvm:destvol
FsxIdxxxxxxx::> volume offline -vserver destsvm -volume destvol
FsxIdxxxxxxx::> volume delete -vserver destsvm -volume destvol
-
Recreate the SnapMirror, and then initialize it:
FsxIdxxxxxxx::> snapmirror create -source-path srcsvm:srcvol -destination-path destsvm:destvol
FsxIdxxxxxxx::> snapmirror initialize -destination-path destsvm:destvol
-
(Optional) If the error persists, then cancel the destination path and then update the path:
FsxIdxxxxxxx::> snapmirror abort -destination-path destsvm:destvol -hard true
FsxIdxxxxxxx::> snapmirror update -destination-path destsvm:destvol
For more information, see SnapMirror transfer fails with error "Destination is in an invalid transfer state" on the NetApp website.
SnapMirror update or resync failure
The SnapMirror update or resync fails with this error:
"No common Snapshot copy found"
This error occurs when there are no common snapshots between the source and destination. Forceful deletion or auto deletion of common snapshots might cause this error.
To resolve this error, follow these steps:
- Delete and release the older SnapMirror relationship.
- Delete the associated destination volume.
- Create and initialize a new SnapMirror relationship in to a new destination volume.
For more information, see Update or resync of a SnapMirror relationship fails with No common Snapshot error on the NetApp website.
SnapMirror transfer throughput has slow replication times
To troubleshoot slow replication times with SnapMirror transfer throughput, complete these steps:
- Check if global throttling is turned on. For more information, see Why does SnapMirror replication take a long time on my FSx for Netapp ONTAP volume?
- Use a ping command to check the maximum transmission unit (MTU) for a mismatch in the network path.
Example successful ping for jumbo frame 9001:
FsxIdxxxxxxx::> network ping -vserver AD -lif nfs_smb_management_1 -destination 10.0.16.178 -disallow-fragmentation true -packet-size 8972 -show-detail true -verbose -count 10
PING 10.0.16.178 (10.0.16.178) from 10.0.12.253: 8972 data bytes
to 10.0.12.253 8980 bytes from 10.0.16.178: icmp_seq=0 ttl=255 time=1.129 ms
to 10.0.12.253 8980 bytes from 10.0.16.178: icmp_seq=1 ttl=255 time=1.061 ms
to 10.0.12.253 8980 bytes from 10.0.16.178: icmp_seq=2 ttl=255 time=1.120 ms
to 10.0.12.253 8980 bytes from 10.0.16.178: icmp_seq=3 ttl=255 time=1.076 ms
For more information, see How to verify optimal MTU packet size for cluster peering and SnapMirror on the NetApp website.
- Check Amazon CloudWatch metrics to see if disk IOPS or network utilization is reaching 100%. Increase the throughput capacity or IOPS. You can also run the qos command to view the current IOPS, throughput, and latency for various workloads on the file system:
FsxIdxxxxxxx::> qos statistics workload performance show
Workload ID IOPS Throughput Latency
--------------- ------ -------- ---------------- ----------
-total- - 446 31.44KB/s 0ms
_USERSPACE_APPS 14 445 31.44KB/s 0ms
_WAFL_SCAN 20 1 0KB/s 0ms
-total- - 435 36.54KB/s 23.00us
_USERSPACE_APPS 14 435 36.54KB/s 23.00us
-total- - 4505 42.31KB/s 0ms
Related information
Delete a volume replication relationship on the NetApp website