I would like to disable some services notifications in Centreon through Nagios web interface. Does anybody know how do that in Nagios ?
Valter Silva's questions
I would like to install tomcat with a recipe
through chef
at some folder, like /company/tomcat6.0.45-port8081
. But sometimes I have many instances of tomcat, like /company/tomcat7.0.41-port8082
, /company/tomcat7.0.39-port8083
and go on, how can I do that ?
Any idea ?
We use XenServer with RAID 0, as one disk only (I wonder if this is a good approach, if not what would be ?), then we install Xen. After that we put the centOS 6.4 x64 DVD in the server and try to install it. But when it should came the option to create a custom layout of the partitions, it don't. So when we choose the 'use entire disk', the OS start to be installed, and then create the default layout. With /home
bigger than /
which we don't want to do that.
I wonder if this is a problem with :
raid or xenserver or xenclient or centos ?
I'm using s3cmd
to backup some logs into Amazon S3
buckets. Which is a great tool.
But I notice that, very often, my sync's
are broken, they just cancel and try to upload again, until they can upload the file, but the problem is that they don't continue from where it's stop the upload, the upload starts from the beggening.
There's something that I can do about this ? To continue the upload even if it's broken ?
Change my .s3cfg
file or add some parameter into my command ?
And what mean throttle
in s3cmd
? I'm asking 'cause when an upload fails, this throttle
increase.
I have to create a backup plan, from my machine .155 to .156.
In .155 my files are in C:/MySQLServer/backup
and should be copied daily to .156 at I:/MySQLServer/backup
, so I was looking over the internet for some tool like rsync
in Linux
. But I also was thinking into Map network drive
that has in both Windows Server v6
.
So I'm wondering about the differences, the advantages and disadvantages between both. Any suggestions or ideas ?
I'm working with Ganglia, great tool by the way!
I'm trying to make this topology, with some nodes running centOS 6.4
, centOS 5.9
.
So for this I'm trying this configuration for GMetad (the rest is default):
##########################################################################
Head Monitor Cluster (GMetad + Gmond > mute no > deaf no + GWeb ) | centOS 6.4 (desktop)
##########################################################################
data_source "head monitor clusters" 10 192.168.1.100 # 192.168.1.100 == localhost
data_source "monitor cluster" 10 192.168.1.51:8649
gridname "Company"
authority "http://192.168.1.100/ganglia/"
##########################################################################
Monitor Cluster (GMond > mute no > deaf no ) | centOS 6.4 (minimal)
##########################################################################
globals {
daemonize = yes
setuid = yes
user = ganglia
debug_level = 0
max_udp_msg_len = 1472
mute = no
deaf = no
allow_extra_data = yes
host_dmax = 86400 /*secs */
cleanup_threshold = 300 /*secs */
gexec = no
send_metadata_interval = 5 /*secs */
}
cluster {
name = "Monitor"
owner = "unspecified"
latlong = "unspecified"
url = "unspecified"
}
/* The host section describes attributes of the host, like the location */
host {
location = "unspecified"
}
udp_send_channel {
host = 192.168.1.51 # send the data collect to itself
port = 8649
ttl = 1
}
udp_send_channel {
host = 192.168.1.100 #send the data to
port = 8649
ttl = 1
}
/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
port = 8649
}
/* You can specify as many tcp_accept_channels as you like to share
an xml description of the state of the cluster */
tcp_accept_channel {
port = 8649
}
##########################################################################
Node (Gmond > mute no > deaf yes ) | centOS 5.9 (minimal)
##########################################################################
/* This configuration is as close to 2.5.x default behavior as possible
The values closely match ./gmond/metric.h definitions in 2.5.x */
globals {
daemonize = yes
setuid = yes
user = ganglia
debug_level = 0
max_udp_msg_len = 1472
mute = no
deaf = yes
host_dmax = 86400 /*secs */
cleanup_threshold = 300 /*secs */
gexec = no
}
cluster {
name = "Monitor"
owner = "unspecified"
latlong = "unspecified"
url = "unspecified"
}
/* The host section describes attributes of the host, like the location */
host {
location = "unspecified"
}
/* Feel free to specify as many udp_send_channels as you like. Gmond
used to only support having a single channel */
udp_send_channel {
host=192.168.1.51 #send to monitor cluster
port = 8649
ttl = 1
}
/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
# mcast_join = 239.2.11.71
port = 8649
# bind = 239.2.11.71
}
/* You can specify as many tcp_accept_channels as you like to share
an xml description of the state of the cluster */
tcp_accept_channel {
port = 8649
}
And that's it.
But this configuration it's not working, I already setup 9 virtual machines to make this work out, but nothing so far.
I disabled iptables, ip6tables and selinux
. I'm working on this for 3 day and night and nothing seems to work..
I would like some help in this, please, please, I beg for help. I really don't understand why this configuration ain't working, after read so many tutorials and O'Reilly book about Ganglia. Any idea ? Or help ?
And yes, if I do :
from 192.168.1.100 8649
telnet 192.168.100 8649
, the whole data collected is displayed.
The same thing to 192.168.1.51
, from 192.168.1.100
, but this one do not display in the Ganglia Web.
The charts, of it, are always -nan.
Any idea ? Thank you!
I would like to create a new standard of two things:
how long would be generated and be rotated the logs generated by my applications ?
how to transfer the logs to Amazon S3, as a backup server ?.
I was thinking in use logrotate, to rotate and compress my daily files this way:
{filename}-{year}-{month}-{day}-{r-months}.gz
The r-months
variable means remain-months
, for how many months the file should remain in S3, files older than that should be removed.
A friend of mine, give the idea I should compress the logs daily
(in the new format proposed above) , after that these files should be sent to our bucket in Amazon S3.
Then files older than 7 days should be removed by logrotate
(cause they are in S3 already).
Nowadays, our applications use log4j
and others to generate logs.
1) Should we disable the versions logs, generated by our application and handle only with logrotate ?
2) In your opinion did you think this could crash some application ?
3) This new format of log, is a good one ?
4) And how send the files to S3 ? Now, I'm using s3cmd
, did you recommend me another tool ?
In my company, we have lots of applications running in lots of different servers. These applications generates lots of logs, which sometimes the developers forgot to compress this logs, this way nagios alert about disk space many times. Beside the fact that I had to check if these logs are compressed and they're old than x days, I have to send these files to our backup server, which is gonna be in amazon, we choose this approach 'cause if some disk is full we just add a new disk on it.
So I have to create a good planning of logs. I wonder if you guys already pass through this problem and what you recommend me to do ? Is my approach good or bad ? Any advice would be very important to me.
I'm starting to learn how develop recipes
to chef
.
I need to install Ganglia Monitor
in some servers (or nodes
in ganglia literature).
So that's why I'm checking if the plataform is ubuntu
, centOS
and many others to install the correct package.
The issue is that I have two different .config
files, actually there is only one or two parameters in this .config
file that would differ one from another.
I need help how to detect in which datacenter
that the server
belongs so that I can copy the properly .config
file.
So far, I was able to develop this script below, but I have some dobut, which are in comments in the code.
#
# Cookbook Name:: ganglia
# Recipe:: default
#
# Copyright 2013, Valter Henrique.com
#
# All rights reserved - Do Not Redistribute
#
# Installing Ganglia Monitor
case node[:platform]
when "ubuntu", "debian"
package "ganglia-monitor"
when "redhat", "centos", "fedora"
package "ganglia-gmond"
end
user "ganglia"
end
# Setting different .config files
case ipaddress
# DataCenter #1
# how put more options in the when condition ? A when for /^200.222./ or /^200.223./ ?
when /^200.222./
# putting config file
cookbook_file "/etc/ganglia/gmond.conf" do
owner "root"
group "root"
mode "0644"
source "dc1/gmond.conf"
notifies(:restart, "service[gmond]")
end
#DataCenter #2
when /^216.235./
cookbook_file "/etc/ganglia/gmond.conf" do
owner "root"
group "root"
mode "0644"
source "dc2/gmond.conf"
notifies(:restart, "service[gmond]")
end
end
Any suggestion in how I develop this code in a better way ?
Yesterday I was trying to install Ganglia in my virtual machine. After I defined Ganglia as a service I reboot the machine, but after that CentOS won't present the log in menu anymore.
I think it's because Ganglia service is blocking the system from starting properly, because I was able to show the start commands that the OS presents as shown below.
How do I fix that ?
I would like to monitor the resources of my servers such as CPU, memory, disk space and many other things. I'm using Nagios + Centreon to do this, but I would like to have a historical view of the use of the resources and show them in charts to have more data to manage them through the years in a better way.
I was thinking about creating a script which would be stored on every machine and would execute every 1 minute, sending data about the resources to my application which would handle this data and store it in my database.
But I was thinking Am I re-inventing the wheel?
There must be some system in the market that does what I'm looking for.
I've looked into Nagios, but it doesn't handle all the information that I seek, nor does Centreon.
Does anyone know of such a system? Am I being too radical in my way of thinking? I'm new to the infrastructure area, so sorry if this question is too naive =]
I was wondering what is a good pratice to create a good script to start/stop/restart some service. I will try to make myself more clear, ok ?
Nowadays, I do something like this: let's say I would like to create a script to start/stop/restart
a service, so I create a folder /company/service name/
and there put the start.sh
and the stop.sh
, which are something like this:
start.sh
#!/bin/bash
#VARIABLES
SERVICE_NAME="<service name>"
USERDEPLOYER="<service name>_deployer"
FOLDER=/company/<service name>/
KEYWORD="<keyword>"
#
#CHECKING SYSTEM STATUS
PROC=`ps -ef | grep $SERVICE_NAME | grep $KEYWORD | grep -v grep | awk -F" " '{ print $2 }'`;
if [ $PROC ]; then
echo "$SERVICE_NAME is running!"
echo "Stop then first!"
exit
fi
###
#
#STARTING
if [[ `/usr/bin/whoami` == $USERDEPLOYER ]]
then
pushd .
echo " "
echo "Starting $SERVICE_NAME..."
echo "cd $FOLDER"
cd $FOLDER
#COMMAND
<command to start the service> &
sleep 20
PROC=`ps -ef | grep $SERVICE_NAME | grep $KEYWORD | grep -v grep | awk -F" " '{ print $2 }'`;
if [ -n "$PROC" ] && [ "$PROC" != "" ]
then
echo "OK: system started."
else
echo "ERROR: system process not found!"
fi
echo "script execution finished!"
popd
else
echo "User must be $USERDEPLOYER !"
fi
stop.sh
#!/bin/bash
#VARIABLES
SERVICE_NAME="<service name>"
USERDEPLOYER="<service name>_deployer"
KEYWORD="python"
if [[ `/usr/bin/whoami` == $USERDEPLOYER ]]
then
pushd .
echo "Stopping $SERVICE_NAME......"
#KILLING PROCESS
processPID=`ps -ef | grep $SERVICE_NAME | grep $KEYWORD | grep -v grep | awk -F" " '{ print $2 }'`
echo "Trying to kill process with key $SERVICE_NAME - ignore error messages below."
kill $processPID
sleep 10
while [ -n "$processPID" ]
do
echo "Waiting process ($processPID) to shutdown...20s"
sleep 20
processPID=`ps -ef | grep $SERVICE_NAME | grep $KEYWORD | grep -v grep | awk -F" " '{ print $2 }'`
done
echo "Ensured process with key $SERVICE_NAME is no longer running."
popd
else
echo "User must be $USERDEPLOYER !"
fi
After that I create an user service name_deployer
, than give the ownership to this folder and these scrits, start.sh
and stop.sh
, giving the permission to read, write and execute
as well.
Then create the follow script in /etc/init.d/
as service name-service
:
#!/bin/bash
#
# Linux chkconfig stuff:
#
# chkconfig: 2345 56 10
# 2345 56
# 2345 10
# description: <description>
# Source function library.
SERVICE_NAME="<service name>-service"
SERVICE_USER="<service name>_deployer"
FOLDER="/company/<service name>/"
start() {
if [[ `/usr/bin/whoami` == $SERVICE_USER ]]
then
cd $FOLDER
./start.sh
#NOT USER _root
else
cd $FOLDER
su $SERVICE_USER ./start.sh
fi
}
stop() {
cd $FOLDER
su $SERVICE_USER ./stop.sh
}
#Body main
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
echo "Restarting $SERVICE_NAME..."
echo " "
stop
sleep 10
start
;;
*)
echo $"Usage: $0 {start|stop|restart}"
exit 1
esac
exit 0
Given the ownership to service name_deployer
and the permission to read, write and execute.
Then add the service to the list of services like this:
/sbin/chkconfig --add service name-service
(suse and others)
or
update-rc.d service name-service
defaults (ubuntu)
And that's all! Did you guys think this is a good approach ? I'm just asking 'cause I would like to create a good standard to this kind of scripts and procedures. Sorry if you guys think this is a lame question but for me is very important this kind of procedure.
Thank you guys!