Working with a VM that starts to boot and then promptly shuts down. Hoping to locate VM boot logs on Xen host but not sure where to look. Examining /var/log/* doesn't show anything obvious.
Suggestions?
Working with a VM that starts to boot and then promptly shuts down. Hoping to locate VM boot logs on Xen host but not sure where to look. Examining /var/log/* doesn't show anything obvious.
Suggestions?
I want to use rsyslog to capture events from SANs, routers and such. (This will be forwarded to kafka and ultimately elasticsearch) So far - this is working fine. I have this configured in a config file in /etc/rsyslog.d
What's not working is that all the local log traffic (from the host running rsyslog) is being forwarded as well. I need a way to send local logs to "standard" local endpoints and remote logs to kafka.
Is this possible using rsyslog?
Ive been looking into JMX metrics being exported from puppet and am not clear on how to interpret them. This particular one is supposed to be measuring the package compiler but it's not clear (to me) what it's saying.
Example data:
{
"request": {
"mbean": "puppetserver:name=puppetlabs.localhost.compiler.compile.develop",
"type": "read"
},
"value": {
"Mean": 515.8850223496175,
"StdDev": 15.410435420213828,
"75thPercentile": 533,
"98thPercentile": 533,
"RateUnit": "events/second",
"95thPercentile": 533,
"99thPercentile": 533,
"Max": 853,
"Count": 188,
"FiveMinuteRate": 0.004556108829698143,
"50thPercentile": 502,
"MeanRate": 0.0026130935976092762,
"Min": 386,
"OneMinuteRate": 0.002335841296852807,
"DurationUnit": "milliseconds",
"999thPercentile": 533,
"FifteenMinuteRate": 0.003374163757709876
},
"timestamp": 1543151404,
"status": 200
}
There appear to be several types of stats mixed together. What do [Rate|Duration]Unit correspond to? Mean / stddev seem straightforward.. but what are they measuring? Does this say "515 events / sec" or "515 ms / event" ? The "MeanRate" - ".002 events / sec or msec/event"?
Am monitoring ~100 remote hosts via a VPN using check_snmp_process.pl. For many months this has worked just fine. Over the weekend I started seeing ERROR: Alarm signal (Nagios time-out) errors from just about every host/process. I can use the command on the command line and get a successful response so I'm not clear why it would timeout under normal usage.
This morning I tried upping the 'timeout' param on the plugin to 20 seconds. For about an hour this appeared to work then in a matter of minutes the failure rate returned to its previous level.
The VPN server doesn't appear to be under any abnormal load. Nor does the nagios machine.
Suggestions on where else to look for the source of this?
Nagios machine: CentOS 6.5
Nagios version: 3.5.1
Plugin version: 1.10
EDIT: When the 'mass timeout' happens it's all within a few seconds. Every host shows the same time (+- 5 seconds) on the report. This may be due to nagios forcing rechecks on 'orphaned processes' from a restart of the service. Not sure yet. Just seems ominous when 40-50 timeouts hit the log all at once.
It appears the XenServer 6.5 has been released and it's quite a bit faster than 6.2 in many ways.
When I installed 6.2 I accepted the various defaults and end up with a system that is awkwardly partitioned. Root is rather small (and perpetually running out of space) while a 250Gb partition sits empty. To this end I'm thinking about doing a rebuild with v6.5.
I've been reading about the process and apparently you can't mix 6.2 and 6.5 in the same pool. If I take the pool members (3) out one at a time, rebuild them and put them in a new pool can I move the various VMs over?
EDIT:
To upgrade XenServer from 6.2 to 6.5 - start with the pool master and work your way through each server. If you've done a repartition you don't have to do the editing steps again - as long as you select 'upgrade' during the 6.5 install it won't repartition the disks.
Hypothetical situation:
replication_factor
to 1 and use SimpleStrategy
.Does this mean that 1/N of the data is now missing?
I have a situation where I need to replace the nameservers for both a.b.c and b.c. I'd rather not have to dedicate two machines to this.
I've been reading about multi-homing but the examples all seem to be for *.b.c rather than a domain and a subdomain of the same.
Is this scenario possible with a single machine?
We've been deploying fanless PCs for a research study. Some of these are having HD issues as their OS runs from an SD card. I'm seeing cases where I need to fsck
a folder or two.
I've tried using shutdown -rF now
but it doesn't seem to be doing the trick. There are notes in the syslog that say stuff was fixed but it doesn't appear to be. Also the order of events in the syslog makes it sound like the OS was fully up when the fsck was done (IE fs was mounted). Certainly not a good thing.
Any suggestions on other ways to fix this without having to do service calls and replace the units?
OS: debian 6.x
Note: I did see this. Gave me the notion for the shutdown command but it doesn't seem to be working properly (or I'm not using it properly).
Have been updating some servers in a small group this week. A mixture of CentOS 6.x and RHEL 5.x. In every case they are getting an updated srvadmin-*
package and subsequently popping up in nagios with the error (SNMP) OpenManage is not installed or is not working correctly
. Has anyone else seen this?
It appears that the plugin is using this function to test if SNMP is working:
#
# Checking if SNMP works by probing for "chassisModelName", which all
# servers should have
#
sub snmp_check {
my $chassisModelName = '1.3.6.1.4.1.674.10892.1.300.10.1.9.1';
SNMP is working on my systems (used SNMPWalk to test) but this value isn't present anymore.
I'm trying to use this command to check on port 587 for my postfix server.
Using nmap -P0 mail.server.com
I see this:
Starting Nmap 5.51 ( http://nmap.org ) at 2013-11-04 05:01 PST
Nmap scan report for mail.server.com (xx.xx.xx.xx)
Host is up (0.0016s latency).
rDNS record for xx.xx.xx.xx: another.server.com
Not shown: 990 closed ports
PORT STATE SERVICE
22/tcp open ssh
25/tcp open smtp
110/tcp open pop3
111/tcp open rpcbind
143/tcp open imap
465/tcp open smtps
587/tcp open submission
993/tcp open imaps
995/tcp open pop3s
5666/tcp open nrpe
So I know the relevant ports for smtps (465 or 587) are open.
When I use openssl s_client -connect mail.server.com:587 -starttls smtp
I get a connection with all the various SSL info. (Same for port 465).
But when I try libexec/check_ssmtp -H mail.server.com -p587
I get:
CRITICAL - Cannot make SSL connection.
140200102082408:error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol:s23_clnt.c:699:
What am I doing wrong?
I needed more space in the /var tree on a VM so I allocated some, booted in to runlevel 1 and copied over the folder to the new space. After changing the fstab entry for /var to reflect the new location I rebooted.
(you can see what's coming)
The boot process was pretty well mangled. I had to disable selinux to get anything working properly.
Given the nature of this system I would like to reenable selinux but I'm not clear how to get it all setup properly. Looking at the perms using ls -Z
it all appears the same (as the original /var folder) but clearly something is amiss.
What step(s) did I miss?
EDIT: This is the (relevant) output of ls -alZ /
:
drwxr-xr-x root root ? var
drwxr-xr-x. root root system_u:object_r:var_t:s0 var.old
Looks like a promising avenue - though I note that /sys, /dev and /proc all have '?' there.
I'm using puppet to maintain a growing pile of debian machines. These will be maintained from their initialization onwards. This means that one step will be setting the apt 'sources.list' file and then updating it.
There are other modules that rely on this list being up to date and will fail if apt-get update
hasn't been called. What I'm wondering is should I introduce some sort of ordering process (using stages or similar) to ensure that the package list is updated before attempting to install apps or just assume that it will likely fail on the first run but work on subsequent runs once the list is updated?
The ethos of puppet seems to be declaring a 'final state' rather than a process. This leads me to believe that letting the errors occur is probably the way to go.
I'm getting ecc warnings from some server RAM. It's a pretty old machine so there isn't any warranty on these parts.
If this were Windows I would expect to see BSOD.
What can I expect from RH5.x?
Im looking into the cost of putting a Cassandra cluster into a colo facility. Along these lines there would be 6-8 servers at the outset with expected growth over time. One option is just a series of Dell R320 (or similar). Another option would be blades or similarly built machines that share power.
Looking at the details of an 8 node system I see it has 4x1620 watt power supplies. This gives a total of 6480 watts. If I have a rack with 208V this means I'm pulling more than 30A at peak. So I've maxed out my 42U rack in 6U of space. I realize this is 'peak load' but it seems a bit extreme.
Am I misunderstanding how this calculation works? I get VA=W and I get that it won't pull this kind of load but 30A is a lot of current. I don't have the luxury of buying one and using a kill-a-watt to accurately measure it. The specs for the system don't make it sound like these are redundant but that's a tremendous amount of current.
Has anyone deployed blades or multi-node servers and measured the required current? I'd love to get a Dell M1000 but the prospect of trying to budget for 40A just makes me need to lie down.
EDIT If I use a kill-a-watt
to measure the input current for a system with n power supplies - do I sum them? Are they all pulling 1/n?
I need to cost out some switching and figure out how to network a PC to this type of network.
Clarifying: This is to connect units in a high rise facility. Each unit has the top picture in a closet. In every other floor in the hallway the cables from two floors are brought together in a panel. Then each panel sends a cable to the basement.
So I'm trying to figure out how to
Briefly: if I have 5 Tb of data and want to deploy this on 5 cassandra servers - does each machine need to have 5 Tb of disk space for data (not counting log space)? From the docs it sounds like at times cassandra will need 2x the data size - so 10Tb / server or 10Tb total in the array?
How much RAM should each machine have? Assume that the 5Tb is all in the same column space. I had been planning to max out the RAM on each machine but I'm not sure that's enough. Do I need an array of servers with a total of 5Tb of RAM?
I have a group of 20-30 Win7 machines that I want to install, update, do some small configuration to and then deploy into 'the field'. At my job they use SCCM to maintain the systems so I started looking into using this. Its quite a complex app!
What Im wondering - is it possible to install and use this program in a (reasonably) simplified fashion to do small deployments? If we can get the Win7 bits to work we'd use it for other installations as well though none larger than a few hundred units.
Up till now we've always used Ghost to create our system images... and then recreate them when something is missing.. or updated.. Im hoping that SCCM is a 'better way' without being so huge as to take longer to setup that just imaging by hand.
FWIW: we have an msdnaa license so the cost(s) of the various apps are legal and covered.
How do I deal with the case where a domain has addresses in more than one subnet?
EG: (bob.com)
joe.bob.com A 14400 10.20.0.10
jim.bob.com A 14400 10.20.0.11
mary.bob.com A 14400 10.20.1.10
susan.bob.com A 14400 10.20.1.11
(0.20.10.in-addr.arpa)
0.20.10.in-addr.arpa 14400 NS bob.com
0.20.10.in-addr.arpa 14400 PTR blahblahblah
10 14400 PTR joe
11 14400 PTR jim
(1.20.10.in-addr.arpa)
1.20.10.in-addr.arpa 14400 NS bob.com
1.20.10.in-addr.arpa 14400 PTR blahblahblah
10 14400 PTR mary
11 14400 PTR susan
I have my 'forward' zone file setup - seems like I need multiple 'reverse' files though.
zone "bob.com" {
type: master;
etcetc
};
zone "0.20.10.in-addr.arpa" {
type: master;
etcetc
};
zone "1.20.10.in-addr.arpa" {
type: master;
etcetc
};
Can I put both of these entries in named.conf on the same machine or do I have to break it up somehow between multiple files / machines?
I started an openvpn server about a year ago. As I didn't know what I was doing I left it as a /30 network. Now I have 40-50 (and growing) clients, each of which is using 4 addresses.
How do I configure it so that it will move on to the next set of addresses when this block runs out? (i.e. I'm using 10.20.0.xx now. I want it to move to 10.20.1.xx, etc)
Is this even possible?