Thursday, March 14, 2013

Monitoring a web page for changes using bash

There's this conference that I'd like to attend and I've heard that it's a hard-to-get-into type conference.  When I go to their site it doesn't have any new info.

Rather than checking the site every day, I'd like to have it monitored and be alerted when something new DOES appear on it.

Now I know there are services like ChangeDetection.com that can monitor it for me, but I was wanting to cobble something together with the tools I already have.  I'd also like to have the ability to customize what it consider "a change" at my disposal when/if I need it.

To that end, I threw together the following bash script.  It monitors a URL and if it detects a change, it sends an email to my gmail account letting me know.

Hope you find it useful.  BTW, I'm using a program called sendEmail to send the email notification.  It's in apt if you're using a debian/ubuntu-like distribution.

#!/bin/bash

# monitor.sh - Monitors a web page for changes
# sends an email notification if the file change

USERNAME="me@gmail.com"
PASSWORD="itzasecret"
URL="http://thepage.com/that/I/want/to/monitor"

for (( ; ; )); do
    mv new.html old.html 2> /dev/null
    curl $URL -L --compressed -s > new.html
    DIFF_OUTPUT="$(diff new.html old.html)"
    if [ "0" != "${#DIFF_OUTPUT}" ]; then
        sendEmail -f $USERNAME -s smtp.gmail.com:587 \
            -xu $USERNAME -xp $PASSWORD -t $USERNAME \
            -o tls=yes -u "Web page changed" \
            -m "Visit it at $URL"
        sleep 10
    fi
done

Then from a bash prompt I run it with the following command:

nohup ./monitor.sh &

Using nohup and throwing it in the background allows me to log out and have the script continue to run.