{"id":113,"date":"2013-10-02T12:34:16","date_gmt":"2013-10-02T12:34:16","guid":{"rendered":"http:\/\/blogs.kent.ac.uk\/unseenit\/?p=113"},"modified":"2013-10-02T12:34:16","modified_gmt":"2013-10-02T12:34:16","slug":"effects-of-zfs-fragmentation-on-underlying-storage","status":"publish","type":"post","link":"https:\/\/blogs.kent.ac.uk\/unseenit\/effects-of-zfs-fragmentation-on-underlying-storage\/","title":{"rendered":"Effects of ZFS Fragmentation on Underlying Storage"},"content":{"rendered":"<p>For some time we&#8217;ve been aware of the effects of ZFS Fragmentation, which typically becomes an issue as a zpool passes 80% full, although we&#8217;ve seen it start anywhere between 70% and over 90% depending on the workload. Typically, you see degradation of performance and increased IO load as\u00a0disk writes require finding and sticking together small chunks of free space. There&#8217;s a good summary of the gory technical details <a href=\"https:\/\/espix.net\/~wildcat\/txt\/zfs-fragmentation.txt\">here<\/a>.<\/p>\n<p>The proper fix for this is to have a separate ZFS Intent Log (ZIL) device but given the number of physical and virtual servers we have that utilise ZFS root disks, this isn&#8217;t practical. Unfortunately, this also doesn&#8217;t fix the issue for existing pools and there&#8217;s no ZFS defragmentation tool. To resolve the issue you need to delete data or present additional storage.<\/p>\n<p>The following charts neatly identify this issue &#8211; the volume below had a pair of competing jobs &#8211; one doing a lot of reads (green), and one doing a small amount of write I\/O (blue, negative axis). A new volume was presented and attached to the zpool at 12:00.<\/p>\n<figure id=\"attachment_111\" aria-describedby=\"caption-attachment-111\" style=\"width: 481px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/blogs.kent.ac.uk\/unseenit\/files\/2013\/10\/LU_518.rrd-10800.png\"><img loading=\"lazy\" class=\"size-full wp-image-111 \" alt=\"ZFS I\/O before presenting new storage\" src=\"http:\/\/blogs.kent.ac.uk\/unseenit\/files\/2013\/10\/LU_518.rrd-10800.png\" width=\"481\" height=\"173\" srcset=\"https:\/\/blogs.kent.ac.uk\/unseenit\/files\/2013\/10\/LU_518.rrd-10800.png 481w, https:\/\/blogs.kent.ac.uk\/unseenit\/files\/2013\/10\/LU_518.rrd-10800-300x107.png 300w\" sizes=\"(max-width: 481px) 100vw, 481px\" \/><\/a><figcaption id=\"caption-attachment-111\" class=\"wp-caption-text\">ZFS IO before presenting new storage<\/figcaption><\/figure>\n<p>At this point the writes stopped going to the existing fragmented disk, freeing up a couple of hundred IOPS for the read job (to the point that is now CPU-bound). In the chart below, you can see the massively reduced write operations on the new unfragmented volume:<\/p>\n<figure id=\"attachment_112\" aria-describedby=\"caption-attachment-112\" style=\"width: 481px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/blogs.kent.ac.uk\/unseenit\/files\/2013\/10\/LU_531.rrd-10800.png\"><img loading=\"lazy\" class=\"size-full wp-image-112 \" alt=\"ZFS IO after adding new storage.\" src=\"http:\/\/blogs.kent.ac.uk\/unseenit\/files\/2013\/10\/LU_531.rrd-10800.png\" width=\"481\" height=\"173\" srcset=\"https:\/\/blogs.kent.ac.uk\/unseenit\/files\/2013\/10\/LU_531.rrd-10800.png 481w, https:\/\/blogs.kent.ac.uk\/unseenit\/files\/2013\/10\/LU_531.rrd-10800-300x107.png 300w\" sizes=\"(max-width: 481px) 100vw, 481px\" \/><\/a><figcaption id=\"caption-attachment-112\" class=\"wp-caption-text\">ZFS IO after adding new storage.<\/figcaption><\/figure>\n<p>That&#8217;s ~200 IOPS, down to ~4 IOPS, a 50-fold amplification of write operations. Couple that with the extra I\/O caused by RAID6 writes, and the underlying Tier-3 SATA disks were being worked extremely hard.<\/p>\n<figure id=\"attachment_114\" aria-describedby=\"caption-attachment-114\" style=\"width: 481px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/blogs.kent.ac.uk\/unseenit\/files\/2013\/10\/DriveBusy-Unit_4-HDU_0.rrd-10800.png\"><img loading=\"lazy\" class=\"size-full wp-image-114 \" alt=\"Drive busy percentage chart\" src=\"http:\/\/blogs.kent.ac.uk\/unseenit\/files\/2013\/10\/DriveBusy-Unit_4-HDU_0.rrd-10800.png\" width=\"481\" height=\"173\" srcset=\"https:\/\/blogs.kent.ac.uk\/unseenit\/files\/2013\/10\/DriveBusy-Unit_4-HDU_0.rrd-10800.png 481w, https:\/\/blogs.kent.ac.uk\/unseenit\/files\/2013\/10\/DriveBusy-Unit_4-HDU_0.rrd-10800-300x107.png 300w\" sizes=\"(max-width: 481px) 100vw, 481px\" \/><\/a><figcaption id=\"caption-attachment-114\" class=\"wp-caption-text\">Drive busy percentage<\/figcaption><\/figure>\n<p>Unsurprisingly, the performance of this filesystem (and the others that share this Raid Group) has increased significantly.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>For some time we&#8217;ve been aware of the effects of ZFS Fragmentation, which typically becomes an issue as a zpool passes 80% full, although we&#8217;ve &hellip; <a href=\"https:\/\/blogs.kent.ac.uk\/unseenit\/effects-of-zfs-fragmentation-on-underlying-storage\/\">Read&nbsp;more<\/a><\/p>\n","protected":false},"author":12,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[28946,28948],"tags":[],"_links":{"self":[{"href":"https:\/\/blogs.kent.ac.uk\/unseenit\/wp-json\/wp\/v2\/posts\/113"}],"collection":[{"href":"https:\/\/blogs.kent.ac.uk\/unseenit\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.kent.ac.uk\/unseenit\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.kent.ac.uk\/unseenit\/wp-json\/wp\/v2\/users\/12"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.kent.ac.uk\/unseenit\/wp-json\/wp\/v2\/comments?post=113"}],"version-history":[{"count":2,"href":"https:\/\/blogs.kent.ac.uk\/unseenit\/wp-json\/wp\/v2\/posts\/113\/revisions"}],"predecessor-version":[{"id":116,"href":"https:\/\/blogs.kent.ac.uk\/unseenit\/wp-json\/wp\/v2\/posts\/113\/revisions\/116"}],"wp:attachment":[{"href":"https:\/\/blogs.kent.ac.uk\/unseenit\/wp-json\/wp\/v2\/media?parent=113"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.kent.ac.uk\/unseenit\/wp-json\/wp\/v2\/categories?post=113"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.kent.ac.uk\/unseenit\/wp-json\/wp\/v2\/tags?post=113"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}