LinkedIn Sourceforge

Vincent's Blog

Pleasure in the job puts perfection in the work (Aristote)

Avoid duplicate emails sent by crontab

Posted on 2025-01-03 13:36:00 from Vincent in OpenBSD

This is a very spimple idea which avoid that cron job sends too much emails.

This script store the hash of the email and, based on that, decide that this is relevant or not.


Avoiding Duplicate Emails with a Simple Shell Script

When setting up automated email notifications, one common issue is the repeated sending of identical emails within a short timeframe. This can lead to email flooding and unnecessary clutter in recipients' inboxes. To address this problem, I wrote a simple shell script, avoid_duplicate.sh, which filters out duplicate messages based on their hash values before they are sent.

How It Works

The script captures input, computes its SHA-256 hash, and checks whether the hash has been recorded in a history file. If the message has been sent before, the script increments a counter instead of allowing the message to be resent. If the number of occurrences exceeds a configurable threshold, the email is sent again, and the counter resets.

Usage

Typically, this script is used in conjunction with cron jobs. For example:

5 * * * * /usr/local/bin/myspecific_script | avoid_duplicate.sh

This ensures that the output of myspecific_script is passed through avoid_duplicate.sh before being sent via email. Note that emails will only be sent if cron is correctly configured and running.

The Script

#!/bin/sh

set -e

TEMP="/tmp/testXXXXXX"
# Parameters that can be changed via environment variables
MAX_OCCURENCE=${MAX_OCCURENCE:-20}
HIST_FILE=${HIST_FILE:-"/var/db/mail_hist_file"}

cat > "$TEMP"

if [ ! -s "$TEMP" ]; then
    # Input is empty, nothing to process
    exit 0
fi

if [ ! -s "$HIST_FILE" ]; then
    echo "# date   counter   hash" >> "$HIST_FILE"
fi

content_hash=$(sha256 "$TEMP" | awk '{print $4}')

if grep "$content_hash" "$HIST_FILE" > /dev/null
then
    # Increment the counter
    prev_count=$(grep "$content_hash" "$HIST_FILE" | awk '{ print $2 }')
    count=$(( prev_count + 1 ))

    # If the count exceeds MAX_OCCURENCE, reset and send the email
    if [ "$prev_count" -gt "$MAX_OCCURENCE" ]; then
        cat "$TEMP"
        count=1
    fi

    # Update the history file safely
    tmp=$(mktemp /tmp/mail_hist_XXXXXX)
    cp "$HIST_FILE" "$tmp"
    sed -i "/$content_hash/s/ $prev_count / $count /" "$tmp"
    cat "$tmp" > "$HIST_FILE"
    rm "$tmp"
    exit 0
else
    echo "$(date +%y%m%d_%H%M) 1 $content_hash" >> "$HIST_FILE"
    cat "$TEMP"
fi

rm "$TEMP"

Explanation

  1. The script reads email content from standard input and writes it to a temporary file.
  2. If the file is empty, the script exits immediately.
  3. It checks whether a history file exists; if not, it creates one.
  4. The SHA-256 hash of the content is computed.
  5. If the hash is found in the history file:
  6. The occurrence counter is incremented.
  7. If the count exceeds MAX_OCCURENCE, the email is sent again, and the count resets.
  8. The history file is updated safely using a temporary file.
  9. If the hash is not found, the email is sent, and its hash is recorded.
  10. The temporary file is cleaned up to prevent clutter.

Configuration

  • MAX_OCCURENCE: Defines how many times an identical email is suppressed before being sent again.
  • HIST_FILE: Specifies the location of the history file.

Next step would be to cleanup HIST_FILE based on dates or based on counters.

Conclusion

This lightweight solution prevents excessive duplicate emails while ensuring that periodic notifications are still delivered. By implementing this script in automated email pipelines, you can reduce redundant messages and improve the efficiency of your email notifications.



1, 0
displayed: 457



What is the last letter of the word Moon?