btrfs Past ^
This post is about XFS but it’s about features that first hit Linux in btrfs, so we need to talk about btrfs for a bit first.
For a long time now, btrfs has had a useful feature called reflinks. Basically this is exposed as cp --reflink=always and takes advantage of extents and copy-on-write in order to do a quick copy of data by merely adding another reference to the extents that the data is currently using, rather than having to read all the data and write it out again, as would be the case in other filesystems.
Here’s an excerpt from the man page for cp:
When –reflink[=always] is specified, perform a lightweight copy, where the data blocks are copied only when modified. If this is not possible the copy fails, or if –reflink=auto is specified, fall back to a standard copy.
Without reflinks a common technique for making a quick copy of a file is the hardlink. Hardlinks have a number of disadvantages though, mainly due to the fact that since there is only one inode all hardlinked copies must have the same metadata (owner, group, permissions, etc.). Software that might modify the files also needs to be aware of hardlinks: naive modification of a hardlinked file modifies all copies of the file.
With reflinks, life becomes much easier:
- Each copy has its own inode so can have different metadata. Only the data extents are shared.
- The filesystem ensures that any write causes a copy-on-write, so applications don’t need to do anything special.
- Space is saved on a per-extent basis so changing one extent still allows all the other extents to remain shared. A change to a hardlinked file requires a new copy of the whole file.
Another feature that extents and copy-on-write allow is block-level out-of-band deduplication.
- Deduplication – the technique of finding and removing duplicate copies of data.
- Block-level – operating on the blocks of data on storage, not just whole files.
- Out-of-band – something that happens only when triggered or scheduled, not automatically as part of the normal operation of the filesystem.
btrfs has an ioctl that a userspace program can use—presumably after finding a sequence of blocks that are identical—to tell the kernel to turn one into a reference to the other, thus saving some space.
It’s necessary that the kernel does it so that any IO that may be going on at the same time that may modify the data can be dealt with. Modifications after the data is reflinked will just case a copy-on-write. If you tried to do it all in a userspace app then you’d risk something else modifying the files at the same time, but by having the kernel do it then in theory it becomes completely safe to do it at any time. The kernel also checks that the sequence of extents really are identical.
In-band deduplication is a feature that’s being worked on in btrfs. It already exists in ZFS though, and there is it rarely recommended for use as it requires a huge amount of memory for keeping hashes of data that has been written. It’s going to be the same story with btrfs, so out-of-band deduplication is still something that will remain useful. And it exists as a feature right now, which is always a bonus.
XFS Future ^
So what has all this got to do with XFS?
Well, in recognition that there might be more than one Linux filesystem with extents and so that reflinks might be more generally useful, the extent-same ioctl got lifted up to be in the VFS layer of the kernel instead of just in btrfs. And the good news is that XFS recently became able to make use of it.
When I say “recently” I do mean really recently. I mean like kernel release 4.9.1 which came out on 2017-01-04. At the moment it comes with massive EXPERIMENTAL warnings, requires a new filesystem to be created with a special format option, and will need an xfsprogs compiled from recent git in order to have a mkfs.xfs that can create such a filesystem.
So before going further, I’m going to assume you’ve compiled a new enough kernel and booted into it, then compiled up a new enough xfsprogs. Both of these are quite simple things to do, for example the Debian documentation for building kernel packages from upstream code works fine.
XFS Reflink Demo ^
Make yourself a new filesystem, with the reflink=1 format option.
# mkfs.xfs -L reflinkdemo -m reflink=1 /dev/xvdc meta-data=/dev/xvdc isize=512 agcount=4, agsize=3276800 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0, rmapbt=0, reflink=1 data = bsize=4096 blocks=13107200, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=6400, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 |
Put it in /etc/fstab for convenience, and mount it somewhere.
# echo "LABEL=reflinkdemo /mnt/xfs xfs relatime 0 2" >> /etc/fstab # mkdir -vp /mnt/xfs mkdir: created directory ‘/mnt/xfs’ # mount /mnt/xfs # df -h /mnt/xfs Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 339M 50G 1% /mnt/xfs |
Create a few files with random data.
# mkdir -vp /mnt/xfs/reflink mkdir: created directory ‘/mnt/xfs/reflink’ # chown -c andy: /mnt/xfs/reflink changed ownership of ‘/mnt/xfs/reflink’ from root:root to andy:andy # exit $ for i in {1..5}; do > echo "Writing $i…"; dd if=/dev/urandom of=/mnt/xfs/reflink/$i bs=1M count=1024; > done Writing 1… 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 4.34193 s, 247 MB/s Writing 2… 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 4.33207 s, 248 MB/s Writing 3… 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 4.33527 s, 248 MB/s Writing 4… 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 4.33362 s, 248 MB/s Writing 5… 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 4.32859 s, 248 MB/s $ df -h /mnt/xfs Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 5.4G 45G 11% /mnt/xfs $ du -csh /mnt/xfs 5.0G /mnt/xfs 5.0G total |
Copy a file and as expected usage will go up by 1GiB. And it will take a little while, even on my nice fast SSDs.
$ time cp -v /mnt/xfs/reflink/{,copy_}1 ‘/mnt/xfs/reflink/1’ -> ‘/mnt/xfs/reflink/copy_1’ real 0m3.420s user 0m0.008s sys 0m0.676s $ df -h /mnt/xfs; du -csh /mnt/xfs/reflink Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 6.4G 44G 13% /mnt/xfs 6.0G /mnt/xfs/reflink 6.0G total |
So what about a reflink copy?
$ time cp -v --reflink=always /mnt/xfs/reflink/{,reflink_}1 ‘/mnt/xfs/reflink/1’ -> ‘/mnt/xfs/reflink/reflink_1’ real 0m0.003s user 0m0.000s sys 0m0.004s $ df -h /mnt/xfs; du -csh /mnt/xfs/reflink Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 6.4G 44G 13% /mnt/xfs 7.0G /mnt/xfs/reflink 7.0G total |
The apparent usage went up by 1GiB but the amount of free space as shown by df stayed the same. No more actual storage was used because the new copy is a reflink. And the copy got done in 4ms as opposed to 3,420ms.
Can we tell more about how these files are laid out? Yes, we can use the filefrag -v command to tell us more.
$ filefrag -v /mnt/xfs/reflink/{,copy_,reflink_}1 Filesystem type is: 58465342 File size of /mnt/xfs/reflink/1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 1572884.. 1835027: 262144: last,shared,eof /mnt/xfs/reflink/1: 1 extent found File size of /mnt/xfs/reflink/copy_1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 917508.. 1179651: 262144: last,eof /mnt/xfs/reflink/copy_1: 1 extent found File size of /mnt/xfs/reflink/reflink_1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 1572884.. 1835027: 262144: last,shared,eof /mnt/xfs/reflink/reflink_1: 1 extent found |
What we can see here is that all three files are composed of a single extent which is 262,144 4KiB blocks in size, but it also tells us that /mnt/xfs/reflink/1 and /mnt/xfs/reflink/reflink_1 are using the same range of physical blocks: 1572884..1835027.
XFS Deduplication Demo ^
We’ve demonstrated that you can use cp --reflink=always to take a cheap copy of your data, but what about data that may already be duplicates without your knowledge? Is there any way to take advantage of the extent-same ioctl for deduplication?
There’s a couple of software solutions for out-of-band deduplication in btrfs, but one I know that works also in XFS is duperemove. You will need to use a git checkout of duperemove for this to work.
A quick reminder of the storage use before we start.
$ df -h /mnt/xfs; du -csh /mnt/xfs/reflink Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 6.4G 44G 13% /mnt/xfs 7.0G /mnt/xfs/reflink 7.0G total $ filefrag -v /mnt/xfs/reflink/{,copy_,reflink_}1 Filesystem type is: 58465342 File size of /mnt/xfs/reflink/1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 1572884.. 1835027: 262144: last,shared,eof /mnt/xfs/reflink/1: 1 extent found File size of /mnt/xfs/reflink/copy_1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 917508.. 1179651: 262144: last,eof /mnt/xfs/reflink/copy_1: 1 extent found File size of /mnt/xfs/reflink/reflink_1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 1572884.. 1835027: 262144: last,shared,eof /mnt/xfs/reflink/reflink_1: 1 extent found |
Run duperemove.
# duperemove -hdr --hashfile=/var/tmp/dr.hash /mnt/xfs/reflink Using 128K blocks Using hash: murmur3 Gathering file list... Adding files from database for hashing. Loading only duplicated hashes from hashfile. Using 2 threads for dedupe phase Kernel processed data (excludes target files): 4.0G Comparison of extent info shows a net change in shared extents of: 1.0G $ df -h /mnt/xfs; du -csh /mnt/xfs/reflink Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 5.4G 45G 11% /mnt/xfs 7.0G /mnt/xfs/reflink 7.0G total $ filefrag -v /mnt/xfs/reflink/{,copy_,reflink_}1 Filesystem type is: 58465342 File size of /mnt/xfs/reflink/1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 1572884.. 1835027: 262144: last,shared,eof /mnt/xfs/reflink/1: 1 extent found File size of /mnt/xfs/reflink/copy_1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 1572884.. 1835027: 262144: last,shared,eof /mnt/xfs/reflink/copy_1: 1 extent found File size of /mnt/xfs/reflink/reflink_1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 1572884.. 1835027: 262144: last,shared,eof /mnt/xfs/reflink/reflink_1: 1 extent found |
The output of du remained the same, but df says that there’s now 1GiB more free space, and filefrag confirms that what’s changed is that copy_1 now uses the same extents as 1 and reflink_1. The duplicate data in copy_1 that in theory we did not know was there, has been discovered and safely reference-linked to the extent from 1, saving us 1GiB of storage.
By the way, I told duperemove to use a hash file because otherwise it will keep that in RAM. For the sake of 7 files that won’t matter but it will if I have millions of files so it’s a habit I get into. It uses that hash file to avoid having to repeatedly re-hash files that haven’t changed.
All that has been demonstrated so far though is whole-file deduplication, as copy_1 was just a regular copy of 1. What about when a file is only partially composed of duplicate data? Well okay.
$ cat /mnt/xfs/reflink/{1,2} > /mnt/xfs/reflink/1_2 $ ls -lah /mnt/xfs/reflink/{1,2,1_2} -rw-r--r-- 1 andy andy 1.0G Jan 10 15:41 /mnt/xfs/reflink/1 -rw-r--r-- 1 andy andy 2.0G Jan 10 16:55 /mnt/xfs/reflink/1_2 -rw-r--r-- 1 andy andy 1.0G Jan 10 15:41 /mnt/xfs/reflink/2 $ df -h /mnt/xfs; du -csh /mnt/xfs/reflink Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 7.4G 43G 15% /mnt/xfs 9.0G /mnt/xfs/reflink 9.0G total $ filefrag -v /mnt/xfs/reflink/{1,2,1_2} Filesystem type is: 58465342 File size of /mnt/xfs/reflink/1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 1572884.. 1835027: 262144: last,shared,eof /mnt/xfs/reflink/1: 1 extent found File size of /mnt/xfs/reflink/2 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262127: 20.. 262147: 262128: 1: 262128.. 262143: 2129908.. 2129923: 16: 262148: last,eof /mnt/xfs/reflink/2: 2 extents found File size of /mnt/xfs/reflink/1_2 is 2147483648 (524288 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262127: 262164.. 524291: 262128: 1: 262128.. 524287: 655380.. 917539: 262160: 524292: last,eof /mnt/xfs/reflink/1_2: 2 extents found |
I’ve concatenated 1 and 2 together into a file called 1_2 and as expected, usage goes up by 2GiB. filefrag confirms that the physical extents in 1_2 are new. We should be able to do better because this 1_2 file does not contain any new unique data.
$ duperemove -hdr --hashfile=/var/tmp/dr.hash /mnt/xfs/reflink Using 128K blocks Using hash: murmur3 Gathering file list... Adding files from database for hashing. Using 2 threads for file hashing phase Kernel processed data (excludes target files): 4.0G Comparison of extent info shows a net change in shared extents of: 3.0G $ df -h /mnt/xfs; du -csh /mnt/xfs/reflink Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 5.4G 45G 11% /mnt/xfs 9.0G /mnt/xfs/reflink 9.0G total |
We can. Apparent usage stays at 9GiB but real usage went back to 5.4GiB which is where we were before we created 1_2.
And the physical layout of the files?
$ filefrag -v /mnt/xfs/reflink/{1,2,1_2} Filesystem type is: 58465342 File size of /mnt/xfs/reflink/1 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 1572884.. 1835027: 262144: last,shared,eof /mnt/xfs/reflink/1: 1 extent found File size of /mnt/xfs/reflink/2 is 1073741824 (262144 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262127: 20.. 262147: 262128: shared 1: 262128.. 262143: 2129908.. 2129923: 16: 262148: last,shared,eof /mnt/xfs/reflink/2: 2 extents found File size of /mnt/xfs/reflink/1_2 is 2147483648 (524288 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 262143: 1572884.. 1835027: 262144: shared 1: 262144.. 524271: 20.. 262147: 262128: 1835028: shared 2: 524272.. 524287: 2129908.. 2129923: 16: 262148: last,shared,eof /mnt/xfs/reflink/1_2: 3 extents found |
It shows that 1_2 is now made up from the same extents as 1 and 2 combined, as expected.
Less of the urandom ^
These synthetic demonstrations using a handful of 1GiB blobs of data from /dev/urandom are all very well, but what about something a little more like the real world?
Okay well let’s see what happens when I take ~30GiB of backup data created by rsnapshot on another host.
rsnapshot is a backup program which makes heavy use of hardlinks. It runs periodically and compares the previous backup data with the new. If they are identical then instead of storing an identical copy it makes a hardlink. This saves a lot of space but does have a lot of limitations as discussed previously.
This won’t be the best example because in some ways there is expected to be more duplication; this data is composed of multiple backups of the same file trees. But on the other hand there shouldn’t be as much because any truly identical files have already been hardlinked together by rsnapshot. But it is a convenient source of real-world data.
So, starting state:
(I deleted all the reflink files)
$ df -h /mnt/xfs; sudo du -csh /mnt/xfs/rsnapshot Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 30G 21G 59% /mnt/xfs 29G /mnt/xfs/rsnapshot 29G total |
A small diversion about how rsnapshot lays out its backups may be useful here. They are stored like this:
- rsnapshot_root / [iteration a] / [client foo] / [directory structure from client foo]
- rsnapshot_root / [iteration a] / [client bar] / [directory structure from client bar]
- …
- …
- rsnapshot_root / [iteration b] / [client foo] / [directory structure from client foo]
- rsnapshot_root / [iteration b] / [client bar] / [directory structure from client bar]
The iterations are commonly things like daily.0, daily.1 … daily.6. As a consequence, the paths:
rsnapshot/daily.*/client_foo
would be backups only from host foo, and:
rsnapshot/daily.0/*
would be backups from all hosts but only the most recent daily sync.
Let’s first see what the savings would be like in looking for duplicates in just one client’s backups.
Here’s the backups I have in this blob of data. The names of the clients are completely made up, though they are real backups.
Client | Size (MiB) |
---|---|
darbee | 14,504 |
achorn | 11,297 |
spader | 2,612 |
reilly | 2,276 |
chino | 2,203 |
audun | 2,184 |
So let’s try deduplicating all of the biggest one’s—darbee‘s—backups:
$ df -h /mnt/xfs Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 30G 21G 59% /mnt/xfs # time duperemove -hdr --hashfile=/var/tmp/dr.hash /mnt/xfs/rsnapshot/*/darbee Using 128K blocks Using hash: murmur3 Gathering file list... Kernel processed data (excludes target files): 8.8G Comparison of extent info shows a net change in shared extents of: 6.8G 9.85user 78.70system 3:27.23elapsed 42%CPU (0avgtext+0avgdata 23384maxresident)k 50703656inputs+790184outputs (15major+20912minor)pagefaults 0swaps $ df -h /mnt/xfs Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 25G 26G 50% /mnt/xfs |
3m27s of run time, somewhere between 5 and 6.8GiB saved. That’s 35%!
Now to deduplicate the lot.
# time duperemove -hdr --hashfile=/var/tmp/dr.hash /mnt/xfs/rsnapshot Using 128K blocks Using hash: murmur3 Gathering file list... Kernel processed data (excludes target files): 5.4G Comparison of extent info shows a net change in shared extents of: 3.4G 29.12user 188.08system 5:02.31elapsed 71%CPU (0avgtext+0avgdata 34040maxresident)k 34978360inputs+572128outputs (18major+45094minor)pagefaults 0swaps $ df -h /mnt/xfs Filesystem Size Used Avail Use% Mounted on /dev/xvdc 50G 23G 28G 45% /mnt/xfs |
5m02 used this time, and another 2–3.4G saved.
Since the actual deduplication does take some time (the kernel having to read the extents, mainly), and most of it was already done in the first pass, a full pass would more likely take the sum of the times, i.e. more like 8m29s.
Still, a total of about 7GiB was saved which is 23%.
It would be very interesting to try this on one of my much larger backup stores.
Why Not Just Use btrfs? ^
Using a filesystem that already has all of these features would certainly seem easier, but I personally don’t think btrfs is stable enough yet. I use it at home in a relatively unexciting setup (8 devices, raid1 for data and metadata, no compression or deduplication) and I wish I didn’t. I wouldn’t dream of using it in a production environment yet.
I’m on the btrfs mailing list and there are way too many posts regarding filesystems that give ENOSPC and become unavailable for writes, or systems that were unexpectedly powered off and when powered back on the btrfs filesystem is completely lost.
I expect the reflink feature in XFS to become non-experimental before btrfs is stable enough for production use.
ZFS? ^
ZFS is great. It doesn’t have out-of-band deduplication or reflinks though, and they don’t plan to any time soon.
Thanks for your thoughtful article, for example it explains why one might look to XFS some day for deduplication rather than relying on BTRFS.
Thank’s for your post, really good explanation about XFS new resources.
I followed your instructions and something does not work.
uname -r
4.10.0-041000-generic
xfsprogs from git
but when I try to dedupe
Dedupe for file “/var/lib/lxc/dialer-7/rootfs/var/tmp/uploads/x-88559797f8bca170d46916d7252bf9949ca79ec1.sln” had status (-95) “Operation not supported”.
[0x274ab20] (00555/21095) Try to dedupe extents with id 646b156b
and so on
Any idea?
Philip,
The only thing I can think of is maybe that you haven’t created the filesystem with the experimental reflink feature, after installing a newer xfsprogs (and making sure that your mkfs.xfs comes from the newer xfsprogs).
# mkfs.xfs -m reflink=1 /dev/xvdc
It did work later, on other files. I am curious about why some files, while identical, cannot be deduped.
I am using LXC containers and with this technology in place, I can have hundreds of identical containers with minimal space requirements.
Great work, thanks.
What other features are on the backburner for XFS?
In addition, any idea when reflink support will be enabled by default?
Is there a solution that works for the case where there are many many small files, with a fair number of copies of each file? duperemove says: `Skipping small file’for almost all the duplicates. (This is for a shared home filesystem for developers,. where almost everyone has checked-out copies of the same git repositories. Most of the files range from 1 byte to 32k
Works for me as low as 4K. Do make sure to recreate your hashfile if changing the blocksize away from what it was created with.
Although it’s unclear to me why 56K was saved there. 🙂
Thank your for this interesting writeup. I could related to the “btrfs sounds nice, but doesn’t seem ready yet” comments, as I felt the same. Luckily reflinks for XFS are considered stable now (well, the EXPERIMENTAL flag has been removed), so plan on using those in my ever growing rsnapshot setup.
First of all, great write-up.
Just tried that in a fresh Debian 10 VM, thinking it would be a good space-saving addition for a new rsync snapshots storage.
Kernel, xfsprogs and duperemove are all up-to date and have all the needed features straight from repos, no need to compile anything, which is great. Duperemove man page still has xfs support as experimental though, maybe they just didn’t update the docs.
Interesting thing to note: cp by default seem to have reflink option set on auto. It’s not said so in the man explicitly, and all the googling didn’t provide me with any info when the default changed. Only various forum post are found where everyone says “auto” will not be a default ever. This is weird. I’ve even searched through coreutils changelog. Nothing.
Another thing that didn’t work for me is deduping a mixed set of copies. I didn’t follow your demo exactly, I created a bunch of copies, some were actual copies, some were reflinked. duperemove referenced all the “real” copies on to one extent, but the ones made with reflinks stayed independent. duperemove specifically mentions in man that is should optimize already deduped extents with the option fiemap which is on by default. But it doesn’t happen. A bit disappointing.
One other weirdness was copies made with MC (Midnight Commander). I expected it to make real copies always. Instead first copy was real and second was reflinked (I ran df after each copy to check). I can’t even begin to imagine why. Now all copies it makes are reflinked. I did run cp after the first one. Could first cp on xfs trigger some kind of feature detection that set reflink to auto? I’m not aware of anything like that, but that is the only explanation I can come up with.
I just want to thank you for this great write-up.
It would be very interesting to try this on larger files, and I know that these aren’t very large, but is what I have:
Comparison of extent info shows a net change in shared extents of: 58.4G
real 344m50.565s
user 1455m54.848s
sys 8m6.075s
bkp_libvirt-images]# ls -alhs
total 131G
0 drwxr-xr-x 2 root root 235 Aug 21 21:26 .
4.0K drwxr-xr-x 6 ville1ero ville1ero 4.0K Aug 21 19:58 ..
34G -rw——- 1 root root 41G Jul 23 00:43 000_000.qcow2
34G -rw——- 1 root root 41G Jul 23 01:25 000.qcow2
32G -rw——- 1 root root 41G Aug 21 17:47 1909_lt-000.qcow2
32G -rw——- 1 root root 41G Aug 21 17:55 1909_pc-000.qcow2
8.0K -rw-r–r– 1 root root 5.3K Aug 21 20:26 200821tek-gdl-apps01.xml
12K -rw-r–r– 1 root root 8.2K Aug 21 20:26 200821tek-gdl-tekops-dev.xml
8.0K -rw-r–r– 1 root root 5.1K Jul 30 13:16 dellfta.xml
8.0K -rw-r–r– 1 root root 5.1K Jul 23 00:48 lt-000.xml
8.0K -rw-r–r– 1 root root 5.1K Jul 23 02:26 pc-000.xml
8.0K -rw-r–r– 1 root root 5.1K Jul 23 03:35 sc-000.xml
bkp_libvirt-images]# df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vgd1-data_smb_libvirt–images 150G 112G 39G 75% /data/smb/libvirt-images
images]# ls -alhs
total 209G
0 drwxr-xr-x 2 root root 175 Aug 22 08:04 .
4.0K drwxr-xr-x 11 root root 4.0K Jul 13 23:35 ..
37G -rw——- 1 root root 41G Aug 21 22:06 lt-000.qcow2
36G -rw——- 1 root root 41G Aug 21 22:08 pc-000.qcow2
24G -rw——- 1 root root 41G Aug 7 12:06 sc-000_000.qcow2
24G -rw——- 1 root root 41G Aug 21 18:25 sc-000.qcow2
44G -rw——- 1 root root 47G Aug 21 20:24 tek-gdl-apps01.qcow2
25G -rw——- 1 root root 21G Aug 22 08:02 tek-gdl-tekops-dev.qcow2
24G -rw——- 1 root root 41G Aug 6 00:08 zsc-000_000.qcow2
images]# df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg1-var_lib_libvirt_images 160G 149G 12G 94% /var/lib/libvirt/images
date Sat 22 Aug 2020 08:05:35 AM CDT
images]# time duperemove -hdr –hashfile=/var/tmp/dr.hash .
Gathering file list…
Using 8 threads for file hashing phase
[1/7] (14.29%) csum: /var/lib/libvirt/images/pc-000.qcow2
[2/7] (28.57%) csum: /var/lib/libvirt/images/sc-000_000.qcow2
[3/7] (42.86%) csum: /var/lib/libvirt/images/sc-000.qcow2
[4/7] (57.14%) csum: /var/lib/libvirt/images/lt-000.qcow2
[5/7] (71.43%) csum: /var/lib/libvirt/images/zsc-000_000.qcow2
[6/7] (85.71%) csum: /var/lib/libvirt/images/tek-gdl-apps01.qcow2
[7/7] (100.00%) csum: /var/lib/libvirt/images/tek-gdl-tekops-dev.qcow2
Total files: 7
Total hashes: 2184955
Loading only duplicated hashes from hashfile.
Hashing completed. Using 4 threads to calculate duplicate extents. This may take some time.
[########################################]
Search completed with no errors.
Simple read and compare of file data found 641 instances of extents that might benefit from deduplication.
Comparison of extent info shows a net change in shared extents of: 3.4G
real 238m31.420s
user 1053m36.499s
sys 6m8.311s
date Sat 22 Aug 2020 12:05:24 PM CDT
images]# ls -alhs
total 210G
0 drwxr-xr-x 2 root root 175 Aug 22 08:04 .
4.0K drwxr-xr-x 11 root root 4.0K Jul 13 23:35 ..
37G -rw——- 1 root root 41G Aug 21 22:06 lt-000.qcow2
36G -rw——- 1 root root 41G Aug 21 22:08 pc-000.qcow2
24G -rw——- 1 root root 41G Aug 7 12:06 sc-000_000.qcow2
24G -rw——- 1 root root 41G Aug 21 18:25 sc-000.qcow2
44G -rw——- 1 root root 47G Aug 21 20:24 tek-gdl-apps01.qcow2
24G -rw——- 1 root root 21G Aug 22 08:02 tek-gdl-tekops-dev.qcow2
24G -rw——- 1 root root 41G Aug 6 00:08 zsc-000_000.qcow2
images]# df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg1-var_lib_libvirt_images 160G 148G 13G 93% /var/lib/libvirt/images
The updated link to the Kernel doc is:
https://www.debian.org/doc/manuals/debian-kernel-handbook/ch-common-tasks.html#s-kernel-org-package
Instead of http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-kernel-org-package
Thanks, I’ve updated that now.