LinkedIn Sourceforge

Vincent's Blog

Pleasure in the job puts perfection in the work (Aristote)

How to filter files for a backup ?

Posted on 2019-05-05 16:12:00 from Vincent in OpenBSD Nas

With the posix tar command, we cannot exclude files' pattern like gnu-tar does. This article will propose some alternatives.


The context

As several people in the Unix world, I'm doing backups via the tar command.
I'm doing a backup per day, and I'm storing it on my NAS. To optimise my disk's space, I'm using the hard links feature of rsync.

For web servers, I have to backup /var/www. This is the place where OpenBSD stores HTML files.

Unfortunately, we also have log files there.

So, I was looking for a possibility to exclude those log files from the tar command.

Unfortunately, we also have log files there.

So, I was looking for a possibility to exclude those log files from the tar command.

By looking at the man page, I don't find any solutions, so I look in internet and find the exact feature in Gnu-tar.

My objective is to perform the same, but with the standard tools we have in OpenBSD.

Possible solutions

Find files and pipe them in tar

The 1st idea was to combine the find command with the tar. So, something like the following:

find /var -type f -not -name "*.log" | tar -czf /tmp/backup.tgz

This command works well, but it cannot manage file's name with blanks. To manage such situation, we must use "print0" and "xargs -0"

:/var# find . -type f -not -name "*.log" -print0 | xargs -0 tar -czf /tmp/backup.tgz
:/var# find . -type f -not -name "*.log"  | wc -l        
10203
:/var# tar -tzf /tmp/backup.tgz | wc -l 
526

Unfortunately the number of files in backup.tgz is not the same as the number of files returned by the find command.
There are no errors reported by those commands. But, we clearly see that the backup is not relevant.

Find and exec

Here, the idea is to use the "exec" parameter of the find command. In other words, find will execute a tar-append for each file found. It works, but you can easily see that this solution will take lot of time.

:/var# tar -czf /tmp/backup.tgz test
:/var# find . -type f -not -name "*.log"  -exec tar -uzf /tmp/backup.tgz {} \;

On my target machine, I've killed the process after 15 minutes. I've even tried without the compression ("z" parameter), but after 15 minutes, I've kill it.

Find piped to cpio

On every OpenBSD, close to tar, we have cpio. Cpio and tar are in fact sharing the same binary (same inode).

:/var# find . -type f -not -name "*.log"  | cpio -o -H ustar | gzip -c > /tmp/backup.tgz
:/var# find . -type f -not -name "*.log"  | wc -l 
10203
:/var# tar -tzf /tmp/backup.tgz | wc -l   
10203

To react on the comment made by Jay Williams, we could simplify the command bytdoing this:

:/var # find . -type f -not -name "*.log" | cpio -o -H ustar -z > /tmp/backup.tgz.tgz
:/var # tar -tzvf /tmp/backup.tgz | wc -l
tar: Removing leading / from absolute path names in the archive
10203

As comparison, this command takes about 20 seconds. And in this case we have all our files in the backup.

By checking the time required on those 2 last commands, I do not have observed major differences.

Conclusions

A combination of find and cpio allow us to backup only the files we want.

With standard Posix tools and without any additional package, this combination allow us to perform what Gnu-tar does.



35, 36
displayed: 7590
Comments:

1. From Jay Williams on Sun May 12 02:28:47 2019

Would it be possible shorten your command by using the cpio flag "-z" and skip the gzip pipe?

2. From Vincent on Sun May 12 19:42:37 2019

Very good remark Jay. I've just adapted the blog accordingly. Many thanks




What is the first vowel of the word Python?