LinkedIn Sourceforge

Vincent's Blog

Pleasure in the job puts perfection in the work (Aristote)

Let's compare rsync with cpdup

Posted on 2024-09-14 11:12:00 from Vincent in OpenBSD Nas

I have more and more big files to transfer between my end user devices (laptop, mobiles, ...). The goal of this blog is to compare cpdup and rsync to synch files between my OpenBSD laptop and my NAS.

Today, files I'm managing are bigger and bigger. For example, with modern camera I often have +500 photos to transfer which represent +6GB of data.

Moreover, I'm an heavy user of hardlinks, so I want to make sure all my hardlinks are correctly reproduced on the NAS.


Introduction

I have written a small shell script allowing me to take regular backups Simple Time Machine.
This script is heavily using hard links as created by rsync.

Usage of such tool allow me to have regular local backups, which allow me to "comeback in the past" in case of big troubles on my working documents.

In this blog post, I'll compare rsync with cpdup for both local and remote copies.

cpdup is a tool created in 1997 by Matthew Dillon, the creator of DragonflyBSD.

rsync is a tool developped in 1996 by Andrew Tridgell, Paul Mackerras

The test case

I first create a reference file which I rely on to perform this test. In this case the has 10MB:

obsd:~/temp $ dd if=/dev/random of=fileref bs=1M count=10

Via the next script, I created 100 files and 32766 hardlinks associated with. Thus in total I will have 3.2 millions hardllinks.

obsd:~/temp $ cat test.sh
set -e

[ ! -d test ] && mkdir test
[ ! -f filreref ] && dd if=/dev/random of=fileref bs=1M count=10
i=1
while [ "$i" -lt 100 ]
do
    mkdir "test/test${i}"
    cp "fileref" "test/test${i}/fileref"
    echo "test/test${i}/fileref"
    count=0
    while [ "$count" -lt 32766 ]
    do
        count=$((count + 1))
        ln "test/test${i}/fileref" "test/test${i}/link_$count"
    done
    i=$((i+1))
done

The first test I did was to put all those files and hardlinks in the same folder. But arriving to more than 1 million files, I sow the process slowing down drastically. No errors reported, but slow. I think that having such amount of files in one folder is a bit too much for my OpenBSD laptop. So, in this final script I create 32766 hardlinks in each directory. That way the creation process take about 10 minutes on my OpenBSD laptop (I've not measured precisely).

obsd:~/temp $ find test -type f  | wc -l
3243933

We are now ready to perform our tests ;)

Note :

Local usage of cpdup

I first create a destination folder located in another filesystem than source

obsd:~/temp $ mkdir /tmp/testhl

Now we are ready to launch cpdup. I'm just using standard parameter avoiding confirmation request.

obsd:~/temp $ cpdup -i0 -s0 -I test/ /tmp/testhl/
cpdup completed successfully
1038090240 bytes source, 1038090240 src bytes read, 0 tgt bytes read
1038090240 bytes written (1.0X speedup)
3243933 source items, 3244032 items copied, 3243834 items linked, 0 things deleted
409.4 seconds  4952 Kbytes/sec synced  2476 Kbytes/sec scanned

obsd:~/temp $ find test -type f  | wc -l
3243933
obsd:~/temp $ find /tmp/testhl/ -type f  | wc -l
3243933
obsd:~/temp $ du -h -d1 /tmp/testhl
1.0G /tmp/testhl/test
1.0G /tmp/testhl
obsd:~/temp $ du -h -d1 .
1.0G ./test
1.0G .

It took 6minutes and 49 seconds and we have well an exact copy of harlinks in destination as in source.

Via the "top -C" command I saw that Memory allocated to this process about 2200k

Local usage of rsync

I remove all my "testhl" folder from /tmp and recreate a target directory

obsd:~/temp $ rm -fr /tmp/testhl
obsd:~/temp $ mkdir /tmp/testhl

Let's launch our test:

obsd:~/temp $ time rsync -aH test /tmp/testhl/
6m41.02s real     0m19.79s user     1m58.47s system

obsd:~/temp $ find /tmp/testhl/ -type f  | wc -l
3243933
obsd:~/temp $ du -h -d1 /tmp/testhl
1.0G /tmp/testhl/test
1.0G /tmp/testhl
obsd:~/temp $ du -h -d1 .
1.0G ./test
1.0G .
obsd:~/temp $ find test -type f  | wc -l
3243933
obsd:~/temp $ find /tmp/testhl/ -type f  | wc -l
3243933

The source of files remains exactly what we had for the previous test.
Via "top", I saw that we had 2 rsync processes and both of them reported a memory usages of about 11M.

In this case, I've used time to inform me about the time taken.
It took 6minutes and 41 seconds to perform the same tasks.
And I have all of my files and hardlinks

Conclusion on local usage

Both cpdup and rsync are quite similar. Rsync is using a bit more memory, but on today's machine this should not be a problem.

Remote setup

For such tests, I take another OpenBSD machine located on my local network: 192.168.3.4
On this machine I create a destination folder:

obsd-test:~ $ mkdir temp

To avoid "password requests", I make sure that public keys of this account is know on my laptop

I will use the same set of files to transfer to the remote machine.

To make sure we have a decent network connectivity I do an iperf test.

On the remote machine, I start the server part:

obsd-test:~ $ iperf -s

On my laptop, I trigger the client part:

obsd:~ $ iperc -c 192.168.3.4
------------------------------------------------------------
Client connecting to 192.168.3.4, TCP port 5001
TCP window size: 17.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.3.26 port 5525 connected with 192.168.3.4 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.09 GBytes   932 Mbits/sec

So, we are quite close to a 1Gb network link between those 2 machines.
With cpdup and rsync, we could not see a transfer rate above 116MB/s

As stated in the documentation of cpdup, we can see that there will be a different behaviour if we "push" files to a remote machine or if we "pull" files from a remote machine. This is an interesting comment, so I will test both cpdup and rsync in both situations

I'll not write all measures and manipulations. But you have to know that before each run I do a cleanup the "destination" folder.

Because on OpenBSD we have nearly only FFS as filesystem, I'll install DrangonflyBSD and Freebsd on the same destination machine to compare with HammerFS and with ZFS.
To be sure that we are in the same conditions, I'll do a full install of the OS on the same disk. So there is no dual boot, virtual environment, jails, ... in this comparison. Just a common hadware machine with common disk and 1Gb ethernet card. The goal will be to see how cpdup and rsync behave on this common machine.

Remote usage

Measure are in seconds.
I reminds that we have 99 folders with 37266 hardlinks to a file of 10MB in each folder.
So, in total we have 99 x 10MB of files and +3 millions hardlinks.

|                                   |       | push | pull 
| OpenBSD src and OpenBSD dest      | cpdup | 4402 | 1827
|                                   | rsync | 2214 | 2137
| OpenBSD src and DragonflyBSD dest | cpdup | 2274 | 220
|                                   | rsync | 187  | 51
| OpenBSD src and FreeBSD dest      | cpdup | 1831 | 84
|                                   | rsync | 76   | 73

When doing a "push", we can see that cpdup is taking more and more memory on the "source" side. I suppose this is because he memorizes all hardlinks and their associated inode. I do not kno why, bu rsync has not such behaviour.
Concerning rsync, we always see 2 rsync processes on the destination side. Globally rsync takes 11MB of ram per process while cpdup is about to 2MB.

Except that we can comp re rsync and cpdup, we can also see the differences of the different filesystems. Indeed, we are creating more than 3 millions files on the destination machine. We can clearly see a delta between FFS and modern filesystems like Hammer and ZFS

Short investigation on filesystem

Since we see that filesystem has a huge impact on the overall performance, let's perform a simple test on those filesystems: duration to create the 99 folders full of hardlinks.

On my laptop, with SSD, it took 49 minutes and 06 seconds. This is not really relevant, but it gives a reference.
On my destination machine with spinning disk:
- with FreeBSD on ZFS it took: 1324 sec
- with FreeBSD on UFS it took: 1208 sec
- with DragonflyBSD on Hammer2 it took: 1864 sec

Conclusions

Both cpdup and rsync are able to sync my 3 millions hardlinks of this test without any problems.

I've not initially expected such effect, but doing a "pull" is always faster than doing a "push". Moreover, in the case of cpdup, we avoid to put lot of harlinks in his memory.

Thanks to such tests, we can see the massive effect of modern filesystems on the performance of a important copy. When we copy small files, such delta are not really noticeable. But with photos, video and even VMs, a copy of several GB is no more exceptional.

My initial goal with this comparison was to see which tool is better to copy/synchronize massive amount of data on my OpenBSD NAS. At the end, what I see is that there is a real benefit to use DragonflyBSD or FreeBSD instead of OpenBSD on this NAS.
OpenBSD will always remains my preferred system, but for the sake of a NAS, alternatives are much better.



0, 0
displayed: 706



What is the first vowel of the word Moon?