XFS, Reflinks and Deduplication

btrfs Past ^

This post is about XFS but it’s about features that first hit Linux in btrfs, so we need to talk about btrfs for a bit first.

For a long time now, btrfs has had a useful feature called reflinks. Basically this is exposed as cp --reflink=always and takes advantage of extents and copy-on-write in order to do a quick copy of data by merely adding another reference to the extents that the data is currently using, rather than having to read all the data and write it out again, as would be the case in other filesystems.

Here’s an excerpt from the man page for cp:

When –reflink[=always] is specified, perform a lightweight copy, where the data blocks are copied only when modified. If this is not possible the copy fails, or if –reflink=auto is specified, fall back to a standard copy.

Without reflinks a common technique for making a quick copy of a file is the hardlink. Hardlinks have a number of disadvantages though, mainly due to the fact that since there is only one inode all hardlinked copies must have the same metadata (owner, group, permissions, etc.). Software that might modify the files also needs to be aware of hardlinks: naive modification of a hardlinked file modifies all copies of the file.

With reflinks, life becomes much easier:

  • Each copy has its own inode so can have different metadata. Only the data extents are shared.
  • The filesystem ensures that any write causes a copy-on-write, so applications don’t need to do anything special.
  • Space is saved on a per-extent basis so changing one extent still allows all the other extents to remain shared. A change to a hardlinked file requires a new copy of the whole file.

Another feature that extents and copy-on-write allow is block-level out-of-band deduplication.

  • Deduplication – the technique of finding and removing duplicate copies of data.
  • Block-level – operating on the blocks of data on storage, not just whole files.
  • Out-of-band – something that happens only when triggered or scheduled, not automatically as part of the normal operation of the filesystem.

btrfs has an ioctl that a userspace program can use—presumably after finding a sequence of blocks that are identical—to tell the kernel to turn one into a reference to the other, thus saving some space.

It’s necessary that the kernel does it so that any IO that may be going on at the same time that may modify the data can be dealt with. Modifications after the data is reflinked will just case a copy-on-write. If you tried to do it all in a userspace app then you’d risk something else modifying the files at the same time, but by having the kernel do it then in theory it becomes completely safe to do it at any time. The kernel also checks that the sequence of extents really are identical.

In-band deduplication is a feature that’s being worked on in btrfs. It already exists in ZFS though, and there is it rarely recommended for use as it requires a huge amount of memory for keeping hashes of data that has been written. It’s going to be the same story with btrfs, so out-of-band deduplication is still something that will remain useful. And it exists as a feature right now, which is always a bonus.

XFS Future ^

So what has all this got to do with XFS?

Well, in recognition that there might be more than one Linux filesystem with extents and so that reflinks might be more generally useful, the extent-same ioctl got lifted up to be in the VFS layer of the kernel instead of just in btrfs. And the good news is that XFS recently became able to make use of it.

When I say “recently” I do mean really recently. I mean like kernel release 4.9.1 which came out on 2017-01-04. At the moment it comes with massive EXPERIMENTAL warnings, requires a new filesystem to be created with a special format option, and will need an xfsprogs compiled from recent git in order to have a mkfs.xfs that can create such a filesystem.

So before going further, I’m going to assume you’ve compiled a new enough kernel and booted into it, then compiled up a new enough xfsprogs. Both of these are quite simple things to do, for example the Debian documentation for building kernel packages from upstream code works fine.

XFS Reflink Demo ^

Make yourself a new filesystem, with the reflink=1 format option.

# mkfs.xfs -L reflinkdemo -m reflink=1 /dev/xvdc
meta-data=/dev/xvdc              isize=512    agcount=4, agsize=3276800 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=1
data     =                       bsize=4096   blocks=13107200, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=6400, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Put it in /etc/fstab for convenience, and mount it somewhere.

# echo "LABEL=reflinkdemo /mnt/xfs xfs relatime 0 2" >> /etc/fstab
# mkdir -vp /mnt/xfs
mkdir: created directory ‘/mnt/xfs’
# mount /mnt/xfs
# df -h /mnt/xfs
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvdc        50G  339M   50G   1% /mnt/xfs

Create a few files with random data.

# mkdir -vp /mnt/xfs/reflink
mkdir: created directory ‘/mnt/xfs/reflink’
# chown -c andy: /mnt/xfs/reflink
changed ownership of ‘/mnt/xfs/reflink’ from root:root to andy:andy
# exit
$ for i in {1..5}; do
> echo "Writing $i…"; dd if=/dev/urandom of=/mnt/xfs/reflink/$i bs=1M count=1024;
> done
Writing 1…
1024+0 records in 
1024+0 records out
1073741824 bytes (1.1 GB) copied, 4.34193 s, 247 MB/s
Writing 2…
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 4.33207 s, 248 MB/s
Writing 3…
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 4.33527 s, 248 MB/s
Writing 4…
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 4.33362 s, 248 MB/s
Writing 5…
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 4.32859 s, 248 MB/s
$ df -h /mnt/xfs
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvdc        50G  5.4G   45G  11% /mnt/xfs
$ du -csh /mnt/xfs
5.0G    /mnt/xfs
5.0G    total

Copy a file and as expected usage will go up by 1GiB. And it will take a little while, even on my nice fast SSDs.

$ time cp -v /mnt/xfs/reflink/{,copy_}1
‘/mnt/xfs/reflink/1’ -> ‘/mnt/xfs/reflink/copy_1’
 
real    0m3.420s
user    0m0.008s
sys     0m0.676s
$ df -h /mnt/xfs; du -csh /mnt/xfs/reflink
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvdc        50G  6.4G   44G  13% /mnt/xfs
6.0G    /mnt/xfs/reflink
6.0G    total

So what about a reflink copy?

$ time cp -v --reflink=always /mnt/xfs/reflink/{,reflink_}1
‘/mnt/xfs/reflink/1’ -> ‘/mnt/xfs/reflink/reflink_1’
 
real    0m0.003s
user    0m0.000s
sys     0m0.004s
$ df -h /mnt/xfs; du -csh /mnt/xfs/reflink
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvdc        50G  6.4G   44G  13% /mnt/xfs
7.0G    /mnt/xfs/reflink
7.0G    total

The apparent usage went up by 1GiB but the amount of free space as shown by df stayed the same. No more actual storage was used because the new copy is a reflink. And the copy got done in 4ms as opposed to 3,420ms.

Can we tell more about how these files are laid out? Yes, we can use the filefrag -v command to tell us more.

$ filefrag -v /mnt/xfs/reflink/{,copy_,reflink_}1
Filesystem type is: 58465342
File size of /mnt/xfs/reflink/1 is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  262143:    1572884..   1835027: 262144:             last,shared,eof
/mnt/xfs/reflink/1: 1 extent found
File size of /mnt/xfs/reflink/copy_1 is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  262143:     917508..   1179651: 262144:             last,eof
/mnt/xfs/reflink/copy_1: 1 extent found
File size of /mnt/xfs/reflink/reflink_1 is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  262143:    1572884..   1835027: 262144:             last,shared,eof
/mnt/xfs/reflink/reflink_1: 1 extent found

What we can see here is that all three files are composed of a single extent which is 262,144 4KiB blocks in size, but it also tells us that /mnt/xfs/reflink/1 and /mnt/xfs/reflink/reflink_1 are using the same range of physical blocks: 1572884..1835027.

XFS Deduplication Demo ^

We’ve demonstrated that you can use cp --reflink=always to take a cheap copy of your data, but what about data that may already be duplicates without your knowledge? Is there any way to take advantage of the extent-same ioctl for deduplication?

There’s a couple of software solutions for out-of-band deduplication in btrfs, but one I know that works also in XFS is duperemove. You will need to use a git checkout of duperemove for this to work.

A quick reminder of the storage use before we start.

$ df -h /mnt/xfs; du -csh /mnt/xfs/reflink
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvdc        50G  6.4G   44G  13% /mnt/xfs
7.0G    /mnt/xfs/reflink
7.0G    total
$ filefrag -v /mnt/xfs/reflink/{,copy_,reflink_}1
Filesystem type is: 58465342
File size of /mnt/xfs/reflink/1 is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  262143:    1572884..   1835027: 262144:             last,shared,eof
/mnt/xfs/reflink/1: 1 extent found
File size of /mnt/xfs/reflink/copy_1 is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  262143:     917508..   1179651: 262144:             last,eof
/mnt/xfs/reflink/copy_1: 1 extent found
File size of /mnt/xfs/reflink/reflink_1 is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  262143:    1572884..   1835027: 262144:             last,shared,eof
/mnt/xfs/reflink/reflink_1: 1 extent found

Run duperemove.

# duperemove -hdr --hashfile=/var/tmp/dr.hash /mnt/xfs/reflink
Using 128K blocks
Using hash: murmur3
Gathering file list...
Adding files from database for hashing.
Loading only duplicated hashes from hashfile.
Using 2 threads for dedupe phase
Kernel processed data (excludes target files): 4.0G
Comparison of extent info shows a net change in shared extents of: 1.0G
$ df -h /mnt/xfs; du -csh /mnt/xfs/reflink
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvdc        50G  5.4G   45G  11% /mnt/xfs
7.0G    /mnt/xfs/reflink
7.0G    total
$ filefrag -v /mnt/xfs/reflink/{,copy_,reflink_}1
Filesystem type is: 58465342
File size of /mnt/xfs/reflink/1 is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  262143:    1572884..   1835027: 262144:             last,shared,eof
/mnt/xfs/reflink/1: 1 extent found
File size of /mnt/xfs/reflink/copy_1 is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  262143:    1572884..   1835027: 262144:             last,shared,eof
/mnt/xfs/reflink/copy_1: 1 extent found
File size of /mnt/xfs/reflink/reflink_1 is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  262143:    1572884..   1835027: 262144:             last,shared,eof
/mnt/xfs/reflink/reflink_1: 1 extent found

The output of du remained the same, but df says that there’s now 1GiB more free space, and filefrag confirms that what’s changed is that copy_1 now uses the same extents as 1 and reflink_1. The duplicate data in copy_1 that in theory we did not know was there, has been discovered and safely reference-linked to the extent from 1, saving us 1GiB of storage.

By the way, I told duperemove to use a hash file because otherwise it will keep that in RAM. For the sake of 7 files that won’t matter but it will if I have millions of files so it’s a habit I get into. It uses that hash file to avoid having to repeatedly re-hash files that haven’t changed.

All that has been demonstrated so far though is whole-file deduplication, as copy_1 was just a regular copy of 1. What about when a file is only partially composed of duplicate data? Well okay.

$ cat /mnt/xfs/reflink/{1,2} > /mnt/xfs/reflink/1_2
$ ls -lah /mnt/xfs/reflink/{1,2,1_2}
-rw-r--r-- 1 andy andy 1.0G Jan 10 15:41 /mnt/xfs/reflink/1
-rw-r--r-- 1 andy andy 2.0G Jan 10 16:55 /mnt/xfs/reflink/1_2
-rw-r--r-- 1 andy andy 1.0G Jan 10 15:41 /mnt/xfs/reflink/2
$ df -h /mnt/xfs; du -csh /mnt/xfs/reflink
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvdc        50G  7.4G   43G  15% /mnt/xfs
9.0G    /mnt/xfs/reflink
9.0G    total
$ filefrag -v /mnt/xfs/reflink/{1,2,1_2}
Filesystem type is: 58465342
File size of /mnt/xfs/reflink/1 is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  262143:    1572884..   1835027: 262144:             last,shared,eof
/mnt/xfs/reflink/1: 1 extent found
File size of /mnt/xfs/reflink/2 is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  262127:         20..    262147: 262128:            
   1:   262128..  262143:    2129908..   2129923:     16:     262148: last,eof
/mnt/xfs/reflink/2: 2 extents found
File size of /mnt/xfs/reflink/1_2 is 2147483648 (524288 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  262127:     262164..    524291: 262128:            
   1:   262128..  524287:     655380..    917539: 262160:     524292: last,eof
/mnt/xfs/reflink/1_2: 2 extents found

I’ve concatenated 1 and 2 together into a file called 1_2 and as expected, usage goes up by 2GiB. filefrag confirms that the physical extents in 1_2 are new. We should be able to do better because this 1_2 file does not contain any new unique data.

$ duperemove -hdr --hashfile=/var/tmp/dr.hash /mnt/xfs/reflink
Using 128K blocks
Using hash: murmur3
Gathering file list...
Adding files from database for hashing.
Using 2 threads for file hashing phase
Kernel processed data (excludes target files): 4.0G
Comparison of extent info shows a net change in shared extents of: 3.0G
$ df -h /mnt/xfs; du -csh /mnt/xfs/reflink
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvdc        50G  5.4G   45G  11% /mnt/xfs
9.0G    /mnt/xfs/reflink
9.0G    total

We can. Apparent usage stays at 9GiB but real usage went back to 5.4GiB which is where we were before we created 1_2.

And the physical layout of the files?

$ filefrag -v /mnt/xfs/reflink/{1,2,1_2}
Filesystem type is: 58465342
File size of /mnt/xfs/reflink/1 is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  262143:    1572884..   1835027: 262144:             last,shared,eof
/mnt/xfs/reflink/1: 1 extent found
File size of /mnt/xfs/reflink/2 is 1073741824 (262144 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  262127:         20..    262147: 262128:             shared
   1:   262128..  262143:    2129908..   2129923:     16:     262148: last,shared,eof
/mnt/xfs/reflink/2: 2 extents found
File size of /mnt/xfs/reflink/1_2 is 2147483648 (524288 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..  262143:    1572884..   1835027: 262144:             shared
   1:   262144..  524271:         20..    262147: 262128:    1835028: shared
   2:   524272..  524287:    2129908..   2129923:     16:     262148: last,shared,eof
/mnt/xfs/reflink/1_2: 3 extents found

It shows that 1_2 is now made up from the same extents as 1 and 2 combined, as expected.

Less of the urandom ^

These synthetic demonstrations using a handful of 1GiB blobs of data from /dev/urandom are all very well, but what about something a little more like the real world?

Okay well let’s see what happens when I take ~30GiB of backup data created by rsnapshot on another host.

rsnapshot is a backup program which makes heavy use of hardlinks. It runs periodically and compares the previous backup data with the new. If they are identical then instead of storing an identical copy it makes a hardlink. This saves a lot of space but does have a lot of limitations as discussed previously.

This won’t be the best example because in some ways there is expected to be more duplication; this data is composed of multiple backups of the same file trees. But on the other hand there shouldn’t be as much because any truly identical files have already been hardlinked together by rsnapshot. But it is a convenient source of real-world data.

So, starting state:

(I deleted all the reflink files)

$ df -h /mnt/xfs; sudo du -csh /mnt/xfs/rsnapshot
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvdc        50G   30G   21G  59% /mnt/xfs
29G     /mnt/xfs/rsnapshot
29G     total

A small diversion about how rsnapshot lays out its backups may be useful here. They are stored like this:

  • rsnapshot_root / [iteration a] / [client foo] / [directory structure from client foo]
  • rsnapshot_root / [iteration a] / [client bar] / [directory structure from client bar]
  • rsnapshot_root / [iteration b] / [client foo] / [directory structure from client foo]
  • rsnapshot_root / [iteration b] / [client bar] / [directory structure from client bar]

The iterations are commonly things like daily.0, daily.1daily.6. As a consequence, the paths:

rsnapshot/daily.*/client_foo

would be backups only from host foo, and:

rsnapshot/daily.0/*

would be backups from all hosts but only the most recent daily sync.

Let’s first see what the savings would be like in looking for duplicates in just one client’s backups.

Here’s the backups I have in this blob of data. The names of the clients are completely made up, though they are real backups.

Client Size (MiB)
darbee 14,504
achorn 11,297
spader 2,612
reilly 2,276
chino 2,203
audun 2,184

So let’s try deduplicating all of the biggest one’s—darbee‘s—backups:

$ df -h /mnt/xfs
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvdc        50G   30G   21G  59% /mnt/xfs
# time duperemove -hdr --hashfile=/var/tmp/dr.hash /mnt/xfs/rsnapshot/*/darbee
Using 128K blocks
Using hash: murmur3
Gathering file list...
Kernel processed data (excludes target files): 8.8G
Comparison of extent info shows a net change in shared extents of: 6.8G
9.85user 78.70system 3:27.23elapsed 42%CPU (0avgtext+0avgdata 23384maxresident)k
50703656inputs+790184outputs (15major+20912minor)pagefaults 0swaps
$ df -h /mnt/xfs
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvdc        50G   25G   26G  50% /mnt/xfs

3m27s of run time, somewhere between 5 and 6.8GiB saved. That’s 35%!

Now to deduplicate the lot.

# time duperemove -hdr --hashfile=/var/tmp/dr.hash /mnt/xfs/rsnapshot
Using 128K blocks
Using hash: murmur3
Gathering file list...
Kernel processed data (excludes target files): 5.4G
Comparison of extent info shows a net change in shared extents of: 3.4G
29.12user 188.08system 5:02.31elapsed 71%CPU (0avgtext+0avgdata 34040maxresident)k
34978360inputs+572128outputs (18major+45094minor)pagefaults 0swaps
$ df -h /mnt/xfs
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvdc        50G   23G   28G  45% /mnt/xfs

5m02 used this time, and another 2–3.4G saved.

Since the actual deduplication does take some time (the kernel having to read the extents, mainly), and most of it was already done in the first pass, a full pass would more likely take the sum of the times, i.e. more like 8m29s.

Still, a total of about 7GiB was saved which is 23%.

It would be very interesting to try this on one of my much larger backup stores.

Why Not Just Use btrfs? ^

Using a filesystem that already has all of these features would certainly seem easier, but I personally don’t think btrfs is stable enough yet. I use it at home in a relatively unexciting setup (8 devices, raid1 for data and metadata, no compression or deduplication) and I wish I didn’t. I wouldn’t dream of using it in a production environment yet.

I’m on the btrfs mailing list and there are way too many posts regarding filesystems that give ENOSPC and become unavailable for writes, or systems that were unexpectedly powered off and when powered back on the btrfs filesystem is completely lost.

I expect the reflink feature in XFS to become non-experimental before btrfs is stable enough for production use.

ZFS? ^

ZFS is great. It doesn’t have out-of-band deduplication or reflinks though, and they don’t plan to any time soon.

14 thoughts on “XFS, Reflinks and Deduplication

  1. Thanks for your thoughtful article, for example it explains why one might look to XFS some day for deduplication rather than relying on BTRFS.

  2. I followed your instructions and something does not work.
    uname -r
    4.10.0-041000-generic
    xfsprogs from git

    but when I try to dedupe
    Dedupe for file “/var/lib/lxc/dialer-7/rootfs/var/tmp/uploads/x-88559797f8bca170d46916d7252bf9949ca79ec1.sln” had status (-95) “Operation not supported”.
    [0x274ab20] (00555/21095) Try to dedupe extents with id 646b156b

    and so on

    Any idea?

    1. Philip,

      The only thing I can think of is maybe that you haven’t created the filesystem with the experimental reflink feature, after installing a newer xfsprogs (and making sure that your mkfs.xfs comes from the newer xfsprogs).

      # mkfs.xfs -m reflink=1 /dev/xvdc

  3. It did work later, on other files. I am curious about why some files, while identical, cannot be deduped.
    I am using LXC containers and with this technology in place, I can have hundreds of identical containers with minimal space requirements.
    Great work, thanks.

  4. What other features are on the backburner for XFS?
    In addition, any idea when reflink support will be enabled by default?

  5. Is there a solution that works for the case where there are many many small files, with a fair number of copies of each file? duperemove says: `Skipping small file’for almost all the duplicates. (This is for a shared home filesystem for developers,. where almost everyone has checked-out copies of the same git repositories. Most of the files range from 1 byte to 32k

    1. Works for me as low as 4K. Do make sure to recreate your hashfile if changing the blocksize away from what it was created with.

      $ sudo rm -v /data/backup/test*; df -k /data/backup; sudo dd if=/dev/urandom bs=1k of
      =/data/backup/test1 count=16; df -k /data/backup; sudo cp -v /data/backup/test{1,2}; 
      df -k /data/backup
      removed ‘/data/backup/test1’
      removed ‘/data/backup/test2’
      Filesystem     1K-blocks    Used Available Use% Mounted on
      /dev/xvdd       16766976 9979312   6787664  60% /data/backup
      16+0 records in
      16+0 records out
      16384 bytes (16 kB) copied, 0.000574161 s, 28.5 MB/s
      Filesystem     1K-blocks    Used Available Use% Mounted on
      /dev/xvdd       16766976 9979348   6787628  60% /data/backup
      ‘/data/backup/test1’ -> ‘/data/backup/test2’
      Filesystem     1K-blocks    Used Available Use% Mounted on
      /dev/xvdd       16766976 9979384   6787592  60% /data/backup
      $ sudo ./duperemove -b 4096 -hd --hashfile /var/lib/duperemove.sqlite /data/backup/test*
      Using 4K blocks
      Using hash: murmur3
      Gathering file list...
      Adding files from database for hashing.
      Using 2 threads for file hashing phase
      [1/2] (50.00%) csum: /data/backup/test1
      [2/2] (100.00%) csum: /data/backup/test2
      Total files:  2
      Total hashes: 8
      Loading only duplicated hashes from hashfile.
      Using 2 threads for dedupe phase
      [0x1992400] (1/4) Try to dedupe extents with id 80e58f4d
      [0x1992400] Dedupe 1 extents (id: 80e58f4d) with target: (12.0K, 4.0K), "/data/backup/test1"
      [0x1992400] (1/4) Try to dedupe extents with id 80e58f4d
      [0x1992400] Dedupe 1 extents (id: 80e58f4d) with target: (12.0K, 4.0K), "/data/backup/test2"
      [0x19924a0] (2/4) Try to dedupe extents with id 7695ba68
      [0x19924a0] Dedupe 1 extents (id: 7695ba68) with target: (8.0K, 4.0K), "/data/backup/test1"
      [0x19924a0] (2/4) Try to dedupe extents with id 7695ba68
      [0x19924a0] Dedupe 1 extents (id: 7695ba68) with target: (8.0K, 4.0K), "/data/backup/test2"
      [0x1992400] (3/4) Try to dedupe extents with id 534b0a45
      [0x19924a0] (4/4) Try to dedupe extents with id 10c31f54
      [0x19924a0] Dedupe 1 extents (id: 10c31f54) with target: (4.0K, 4.0K), "/data/backup/test1"
      [0x19924a0] (4/4) Try to dedupe extents with id 10c31f54
      [0x19924a0] Dedupe 1 extents (id: 10c31f54) with target: (4.0K, 4.0K), "/data/backup/test2"
      [0x1992400] Dedupe 1 extents (id: 534b0a45) with target: (0.0, 4.0K), "/data/backup/test1"
      [0x1992400] (3/4) Try to dedupe extents with id 534b0a45
      [0x1992400] Dedupe 1 extents (id: 534b0a45) with target: (0.0, 4.0K), "/data/backup/test2"
      Kernel processed data (excludes target files): 32.0K
      Comparison of extent info shows a net change in shared extents of: 32.0K
      $ df -k /data/backup
      Filesystem     1K-blocks    Used Available Use% Mounted on
      /dev/xvdd       16766976 9979328   6787648  60% /data/backup
      

      Although it’s unclear to me why 56K was saved there. 🙂

  6. Thank your for this interesting writeup. I could related to the “btrfs sounds nice, but doesn’t seem ready yet” comments, as I felt the same. Luckily reflinks for XFS are considered stable now (well, the EXPERIMENTAL flag has been removed), so plan on using those in my ever growing rsnapshot setup.

  7. First of all, great write-up.

    Just tried that in a fresh Debian 10 VM, thinking it would be a good space-saving addition for a new rsync snapshots storage.
    Kernel, xfsprogs and duperemove are all up-to date and have all the needed features straight from repos, no need to compile anything, which is great. Duperemove man page still has xfs support as experimental though, maybe they just didn’t update the docs.

    Interesting thing to note: cp by default seem to have reflink option set on auto. It’s not said so in the man explicitly, and all the googling didn’t provide me with any info when the default changed. Only various forum post are found where everyone says “auto” will not be a default ever. This is weird. I’ve even searched through coreutils changelog. Nothing.

    Another thing that didn’t work for me is deduping a mixed set of copies. I didn’t follow your demo exactly, I created a bunch of copies, some were actual copies, some were reflinked. duperemove referenced all the “real” copies on to one extent, but the ones made with reflinks stayed independent. duperemove specifically mentions in man that is should optimize already deduped extents with the option fiemap which is on by default. But it doesn’t happen. A bit disappointing.

    One other weirdness was copies made with MC (Midnight Commander). I expected it to make real copies always. Instead first copy was real and second was reflinked (I ran df after each copy to check). I can’t even begin to imagine why. Now all copies it makes are reflinked. I did run cp after the first one. Could first cp on xfs trigger some kind of feature detection that set reflink to auto? I’m not aware of anything like that, but that is the only explanation I can come up with.

  8. I just want to thank you for this great write-up.

    It would be very interesting to try this on larger files, and I know that these aren’t very large, but is what I have:

    Comparison of extent info shows a net change in shared extents of: 58.4G
    real 344m50.565s
    user 1455m54.848s
    sys 8m6.075s

    bkp_libvirt-images]# ls -alhs
    total 131G
    0 drwxr-xr-x 2 root root 235 Aug 21 21:26 .
    4.0K drwxr-xr-x 6 ville1ero ville1ero 4.0K Aug 21 19:58 ..
    34G -rw——- 1 root root 41G Jul 23 00:43 000_000.qcow2
    34G -rw——- 1 root root 41G Jul 23 01:25 000.qcow2
    32G -rw——- 1 root root 41G Aug 21 17:47 1909_lt-000.qcow2
    32G -rw——- 1 root root 41G Aug 21 17:55 1909_pc-000.qcow2
    8.0K -rw-r–r– 1 root root 5.3K Aug 21 20:26 200821tek-gdl-apps01.xml
    12K -rw-r–r– 1 root root 8.2K Aug 21 20:26 200821tek-gdl-tekops-dev.xml
    8.0K -rw-r–r– 1 root root 5.1K Jul 30 13:16 dellfta.xml
    8.0K -rw-r–r– 1 root root 5.1K Jul 23 00:48 lt-000.xml
    8.0K -rw-r–r– 1 root root 5.1K Jul 23 02:26 pc-000.xml
    8.0K -rw-r–r– 1 root root 5.1K Jul 23 03:35 sc-000.xml

    bkp_libvirt-images]# df -h .
    Filesystem Size Used Avail Use% Mounted on
    /dev/mapper/vgd1-data_smb_libvirt–images 150G 112G 39G 75% /data/smb/libvirt-images

  9. images]# ls -alhs
    total 209G
    0 drwxr-xr-x 2 root root 175 Aug 22 08:04 .
    4.0K drwxr-xr-x 11 root root 4.0K Jul 13 23:35 ..
    37G -rw——- 1 root root 41G Aug 21 22:06 lt-000.qcow2
    36G -rw——- 1 root root 41G Aug 21 22:08 pc-000.qcow2
    24G -rw——- 1 root root 41G Aug 7 12:06 sc-000_000.qcow2
    24G -rw——- 1 root root 41G Aug 21 18:25 sc-000.qcow2
    44G -rw——- 1 root root 47G Aug 21 20:24 tek-gdl-apps01.qcow2
    25G -rw——- 1 root root 21G Aug 22 08:02 tek-gdl-tekops-dev.qcow2
    24G -rw——- 1 root root 41G Aug 6 00:08 zsc-000_000.qcow2

    images]# df -h .
    Filesystem Size Used Avail Use% Mounted on
    /dev/mapper/vg1-var_lib_libvirt_images 160G 149G 12G 94% /var/lib/libvirt/images

    date Sat 22 Aug 2020 08:05:35 AM CDT

    images]# time duperemove -hdr –hashfile=/var/tmp/dr.hash .
    Gathering file list…
    Using 8 threads for file hashing phase
    [1/7] (14.29%) csum: /var/lib/libvirt/images/pc-000.qcow2
    [2/7] (28.57%) csum: /var/lib/libvirt/images/sc-000_000.qcow2
    [3/7] (42.86%) csum: /var/lib/libvirt/images/sc-000.qcow2
    [4/7] (57.14%) csum: /var/lib/libvirt/images/lt-000.qcow2
    [5/7] (71.43%) csum: /var/lib/libvirt/images/zsc-000_000.qcow2
    [6/7] (85.71%) csum: /var/lib/libvirt/images/tek-gdl-apps01.qcow2
    [7/7] (100.00%) csum: /var/lib/libvirt/images/tek-gdl-tekops-dev.qcow2
    Total files: 7
    Total hashes: 2184955
    Loading only duplicated hashes from hashfile.
    Hashing completed. Using 4 threads to calculate duplicate extents. This may take some time.
    [########################################]
    Search completed with no errors.
    Simple read and compare of file data found 641 instances of extents that might benefit from deduplication.

    Comparison of extent info shows a net change in shared extents of: 3.4G
    real 238m31.420s
    user 1053m36.499s
    sys 6m8.311s
    date Sat 22 Aug 2020 12:05:24 PM CDT

    images]# ls -alhs
    total 210G
    0 drwxr-xr-x 2 root root 175 Aug 22 08:04 .
    4.0K drwxr-xr-x 11 root root 4.0K Jul 13 23:35 ..
    37G -rw——- 1 root root 41G Aug 21 22:06 lt-000.qcow2
    36G -rw——- 1 root root 41G Aug 21 22:08 pc-000.qcow2
    24G -rw——- 1 root root 41G Aug 7 12:06 sc-000_000.qcow2
    24G -rw——- 1 root root 41G Aug 21 18:25 sc-000.qcow2
    44G -rw——- 1 root root 47G Aug 21 20:24 tek-gdl-apps01.qcow2
    24G -rw——- 1 root root 21G Aug 22 08:02 tek-gdl-tekops-dev.qcow2
    24G -rw——- 1 root root 41G Aug 6 00:08 zsc-000_000.qcow2

    images]# df -h .
    Filesystem Size Used Avail Use% Mounted on
    /dev/mapper/vg1-var_lib_libvirt_images 160G 148G 13G 93% /var/lib/libvirt/images

Leave a Reply

Your email address will not be published. Required fields are marked *