summaryrefslogtreecommitdiffstats
path: root/system/duperemove/README
blob: b535b70f0cb56181c117affa397ab3b9db8faa91 (plain)
Duperemove is a simple tool for finding duplicated extents and
submitting them for deduplication. When given a list of files it will
hash their contents on an extent by extent basis and compare those
hashes to each other, finding and categorizing extents that match each
other. Optionally, a per-block hash can be applied for further
duplication lookup. When given the -d option, duperemove will submit
those extents for deduplication using the Linux kernel FIDEDUPRANGE
ioctl.

Duperemove can store the hashes it computes in a 'hashfile'. If given an
existing hashfile, duperemove will only compute hashes for those files
which have changed since the last run. Thus you can run duperemove
repeatedly on your data as it changes, without having to re-checksum
unchanged data.

Duperemove can also take input from the fdupes program.

Deduplication is currently only supported by the btrfs and xfs
filesystems.

fdupes is an optional runtime dependency (allows the use of the --fdupes
command line option).