Ping a Specific Port

Question

Giacomo1968

Asked: 2013-03-03 11:40:09 +0800 CST2013-03-03 11:40:09 +0800 CST 2013-03-03 11:40:09 +0800 CST

How to TAIL & EGREP for a specified time range in a BASH script

772

The subject mostly says it all. I am in charge of a few web servers running Ubuntu 12.04 running Apache2 & I would like to setup APC.

Now I understand APC can hit segmentation fault issues when acting on PHP code that has errors or quirks. So cleanup is advisable, but not really practical for a manpower standpoint.

So I have cooked up a script to monitor the main Apache2 error.log & count how many segmentation faults it sees. I run it as a cron job that runs every 2 minutes. If it hits a specified number of segmentation faults, it should automatically restart Apache2 to clear out APC & get services running smoothly again. The seed idea of this script comes from a comment on this page but I have really heavily built upon the concept to make it more production ready for my needs & tastes.

I am generally happy with this script, but feel one major improvement is the core logic in this snippet:

if [[ `tail -n ${TAIL_NUMLINES} "${APACHE_ERROR_LOG}" | egrep -c "${TEXT_TO_WATCH}"` -ge ${FAIL_COUNT} ]]; then

That is basically the core logic that tails the log file for a specified number of lines, and if it sees a specified number of log entries with exit signal Segmentation fault in them, then it decides it's time to do something. In this case, log the incident, e-mail someone about the incident & restart Apache.

I would like that logic to factor in time because errors can be few & far between. So there are cases where my FAIL_COUNT simply matches the TAIL_NUMLINES despite a restart because no new entries are in the main error log because no new errors have hit. Which would end up in a situation where the server basically restarts every time the cron job hits. Which is horrible.

So my stop-gap solution for now is to set FAIL_COUNT & TAIL_NUMLINES to a low enough number to match the standard counts of entries that are created when Apache2 reloads. But I still don't like that.

So what—if anything—can be done to add a time frame to my tail/egrep logic. Also, I would like to avoid creating a timestamp or log line position hint being saved to a file if possible. I want this script to be self-contained other than depending on the cron job.

Full script as I have it now.

#!/bin/bash

LOCK_NAME="APACHE_LOGWATCHER"
LOCK_DIR=/tmp/${LOCK_NAME}.lock
PID_FILE=${LOCK_DIR}/${LOCK_NAME}.pid

DATE=`date +%Y%m%d`
TIME=`date +%H%M`
# SUFFIX="-"${DATE}"-"${TIME};
SUFFIX="-"${DATE};

APACHE_ERROR_LOG="/var/log/apache2/error.log"
APACHE_RESTART="/etc/init.d/apache2 restart"
TEXT_TO_WATCH="exit signal Segmentation fault"

HOSTNAME=$(hostname)
MAIL_ADDRESS="myname@domain.name.here.com"
MAIL_SUBJECT=${HOSTNAME}": Apache Segfault Notification"

SCRIPT_NAME=$(basename "$0")
SCRIPT_BASE_NAME=${SCRIPT_NAME%.*}

LOG_DIR="/opt/segfault_logs/"
LOG_FILENAME=${SCRIPT_BASE_NAME}${SUFFIX}".log"
LOG_FULLPATH=${LOG_DIR}${LOG_FILENAME}

TAIL_NUMLINES=5
FAIL_COUNT=4

# If the Apache log file doesn't exist, then exit.
if [ ! -f ${APACHE_ERROR_LOG} ]; then
    exit
fi

# Main process.
if mkdir ${LOCK_DIR} 2>/dev/null; then
    # If the ${LOCK_DIR} doesn't exist, then start working & store the ${PID_FILE}
    echo $$ > ${PID_FILE}

    STARTUP_MESSAGE="`date` Log watcher starting."
    if [ -d ${LOG_DIR} ]; then
        echo ${STARTUP_MESSAGE} >> ${LOG_FULLPATH}
    fi

    # Tail--but do not follow--a chunk of the LOG_FULLPATH if the number of instances is
    # greater than or equal to the FAIL_COUNT, act
    if [[ `tail -n ${TAIL_NUMLINES} "${APACHE_ERROR_LOG}" | egrep -c "${TEXT_TO_WATCH}"` -ge ${FAIL_COUNT} ]]; then

        # Create the log message.
        LOG_MESSAGE="`date` Segfault detected on "$HOSTNAME

        # Log the error to the file.
        if [ -d ${LOG_DIR} ]; then
            echo ${LOG_MESSAGE} >> ${LOG_FULLPATH}
        fi

        # Send e-mail notification.
        echo ${LOG_MESSAGE}$'\n\r'${FAIL_COUNT} | mail -s "${MAIL_SUBJECT}" ${MAIL_ADDRESS}

        # Restart Apache
        ${APACHE_RESTART}
    fi

    rm -rf ${LOCK_DIR}
    exit
else
    if [ -f ${PID_FILE} ] && kill -0 $(cat ${PID_FILE}) 2>/dev/null; then
        # Confirm that the process file exists & a process
        # with that PID is truly running.
        # echo "Running [PID "$(cat ${PID_FILE})"]" >&2
        exit
    else
        # If the process is not running, yet there is a PID file--like in the case
        # of a crash or sudden reboot--then get rid of the ${LOCK_DIR}
        rm -rf ${LOCK_DIR}
        exit
    fi
fi

EDIT: Here is an example of the Apache2 logfile output the above script is monitoring.

[Sat Mar 02 14:32:26 2013] [notice] child pid 14696 exit signal Segmentation fault (11)
[Sat Mar 02 14:32:27 2013] [notice] child pid 13914 exit signal Segmentation fault (11)
[Sat Mar 02 14:32:27 2013] [notice] child pid 15735 exit signal Segmentation fault (11)
[Sat Mar 02 14:32:28 2013] [notice] child pid 14865 exit signal Segmentation fault (11)
[Sat Mar 02 14:32:28 2013] [notice] child pid 15545 exit signal Segmentation fault (11)
[Sat Mar 02 14:32:30 2013] [notice] child pid 13821 exit signal Segmentation fault (11)
[Sat Mar 02 14:32:31 2013] [notice] child pid 15683 exit signal Segmentation fault (11)
[Sat Mar 02 14:32:47 2013] [notice] child pid 15684 exit signal Segmentation fault (11)
[Sat Mar 02 14:33:54 2013] [notice] child pid 15482 exit signal Segmentation fault (11)
[Sat Mar 02 14:34:04 2013] [notice] caught SIGTERM, shutting down
[Sat Mar 02 14:34:06 2013] [notice] ModSecurity for Apache/2.6.3 (http://www.modsecurity.org/) configured.
[Sat Mar 02 14:34:06 2013] [notice] ModSecurity: APR compiled version="1.4.6"; loaded version="1.4.6"
[Sat Mar 02 14:34:06 2013] [notice] ModSecurity: PCRE compiled version="8.12"; loaded version="8.12 2011-01-15"
[Sat Mar 02 14:34:06 2013] [notice] ModSecurity: LUA compiled version="Lua 5.1"
[Sat Mar 02 14:34:06 2013] [notice] ModSecurity: LIBXML compiled version="2.7.8"
[Sat Mar 02 14:34:07 2013] [notice] Apache/2.2.22 (Ubuntu) mod_ssl/2.2.22 OpenSSL/1.0.1 configured -- resuming normal operations

2 Answers

Voted

Ladadadada · Answer 1 · 2013-03-03T12:25:49+08:00

Ladadadada

2013-03-03T12:25:49+08:002013-03-03T12:25:49+08:00

This sounds like a job for logtail. The purpose of logtail is to remember where you got up to when reading a file last time so you can start again at that point next time.

Use it like this:

logtail -o /tmp/apache.offset ${APACHE_ERROR_LOG} | egrep -c "${TEXT_TO_WATCH}"

1

Jay · Answer 2 · 2013-03-03T12:17:18+08:00

Jay

2013-03-03T12:17:18+08:002013-03-03T12:17:18+08:00

This demonstrates my idea

should_restart.sh

last_restart_line_number=$( grep restart sample_log_file.txt -n | tail -1 | cut -f1 -d: )
segfaults_since_restart=$( tail -n +$last_restart_line_number sample_log_file.txt | grep segfault -c )

if [ $segfaults_since_restart -gt 5 ]; then
    echo "Yes, restart apache"
else
    echo "No, don't restart apache"
fi

sample_log_file.txt

segfault
segfault
segfault
restart
segfault
segfault
segfault
restart
segfault
segfault
segfault
restart
segfault
segfault
segfault
segfault

0

How to TAIL & EGREP for a specified time range in a BASH script

should_restart.sh

sample_log_file.txt

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?