How to filter files for a backup ?
Posted on 2019-05-05 16:12:00 from Vincent in OpenBSD Nas
The context
As several people in the Unix world, I'm doing backups via the tar command.
I'm doing a backup per day, and I'm storing it on my NAS. To optimise my disk's space, I'm using the hard links feature of rsync.
For web servers, I have to backup /var/www. This is the place where OpenBSD stores HTML files.
Unfortunately, we also have log files there.
So, I was looking for a possibility to exclude those log files from the tar command.
Unfortunately, we also have log files there.
So, I was looking for a possibility to exclude those log files from the tar command.
By looking at the man page, I don't find any solutions, so I look in internet and find the exact feature in Gnu-tar.
My objective is to perform the same, but with the standard tools we have in OpenBSD.
Possible solutions
Find files and pipe them in tar
The 1st idea was to combine the find command with the tar. So, something like the following:
find /var -type f -not -name "*.log" | tar -czf /tmp/backup.tgz
This command works well, but it cannot manage file's name with blanks. To manage such situation, we must use "print0" and "xargs -0"
:/var# find . -type f -not -name "*.log" -print0 | xargs -0 tar -czf /tmp/backup.tgz
:/var# find . -type f -not -name "*.log" | wc -l
10203
:/var# tar -tzf /tmp/backup.tgz | wc -l
526
Unfortunately the number of files in backup.tgz is not the same as the number of files returned by the find command.
There are no errors reported by those commands. But, we clearly see that the backup is not relevant.
Find and exec
Here, the idea is to use the "exec" parameter of the find command. In other words, find will execute a tar-append for each file found. It works, but you can easily see that this solution will take lot of time.
:/var# tar -czf /tmp/backup.tgz test
:/var# find . -type f -not -name "*.log" -exec tar -uzf /tmp/backup.tgz {} \;
On my target machine, I've killed the process after 15 minutes. I've even tried without the compression ("z" parameter), but after 15 minutes, I've kill it.
Find piped to cpio
On every OpenBSD, close to tar, we have cpio. Cpio and tar are in fact sharing the same binary (same inode).
:/var# find . -type f -not -name "*.log" | cpio -o -H ustar | gzip -c > /tmp/backup.tgz
:/var# find . -type f -not -name "*.log" | wc -l
10203
:/var# tar -tzf /tmp/backup.tgz | wc -l
10203
To react on the comment made by Jay Williams, we could simplify the command bytdoing this:
:/var # find . -type f -not -name "*.log" | cpio -o -H ustar -z > /tmp/backup.tgz.tgz
:/var # tar -tzvf /tmp/backup.tgz | wc -l
tar: Removing leading / from absolute path names in the archive
10203
As comparison, this command takes about 20 seconds. And in this case we have all our files in the backup.
By checking the time required on those 2 last commands, I do not have observed major differences.
Conclusions
A combination of find and cpio allow us to backup only the files we want.
With standard Posix tools and without any additional package, this combination allow us to perform what Gnu-tar does.
1. From Jay Williams on Sun May 12 02:28:47 2019
Would it be possible shorten your command by using the cpio flag "-z" and skip the gzip pipe?
2. From Vincent on Sun May 12 19:42:37 2019
Very good remark Jay. I've just adapted the blog accordingly. Many thanks