I have a HA pair of F5 BIG-IP devices running version 11.5.3 Build 1.0.167 Hotfix HF1
. I currently have an iRule attached to about 200 virtual servers which enables high-speed logging for certain types of events. I need to update this iRule, but I'm worried about what will happen to existing connections. Will they be broken or continue running using the old version of the iRule? Many of these virtual servers are for our ERP systems, so connection interruptions are basically unacceptable.
D34DM347's questions
I have a SPARC T3-4 server experiencing a known bug (CR 7003014 as described here: http://docs.oracle.com/cd/E19417-01/html/E20814/z40004961296327.html). The solution is to update the firmware to at least version 8.0.4.b. My system is currently on a 3.X version. Can I upgrade straight to the 8.0.4.b release, or do I need to upgrade through each major version?
I'm asking because I recently restored a system from a backup using NetBackup and uptime now reports that the system has only been up since the restore finished, despite the fact that it has been running for several days. Last reboot
also reports the wrong information, but it but goes the other way, saying the last reboot was several months ago, when the system has been rebooted many times since then.
Essentially I want to know where the uptime and reboot info is stored so that I can maintain it across a restore in the future.
> uptime
9:54am up 1 day(s), 15:52, 3 users, load average: 0.93, 0.95, 0.86
> last reboot
wtmp begins Mon Sep 21 03:10
We have a BIP-IP 6400 LTM device that is killing processes with an alarming frequency. The CPU is consistently around 23% utilization, so that is not an issue.
Here is a sample from /var/log/ltm
:
Oct 7 08:21:55 local/pri-4600 info bigd[3471]: reap_child: child process PID = 25338 exited with signal = 9
Oct 7 08:22:15 local/pri-4600 info bigd[3471]: reap_child: child process PID = 25587 exited with signal = 9
Oct 7 08:22:34 local/pri-4600 info bigd[3471]: reap_child: child process PID = 25793 exited with signal = 9
Oct 7 08:23:10 local/pri-4600 info bigd[3471]: reap_child: child process PID = 26260 exited with signal = 9
Oct 7 08:23:36 local/pri-4600 info bigd[3471]: reap_child: child process PID = 26584 exited with signal = 9
Oct 7 08:23:40 local/pri-4600 info bigd[3471]: reap_child: child process PID = 26647 exited with signal = 9
Oct 7 08:23:45 local/pri-4600 info bigd[3471]: reap_child: child process PID = 26699 exited with signal = 9
Oct 7 08:23:55 local/pri-4600 info bigd[3471]: reap_child: child process PID = 26805 exited with signal = 9
Oct 7 08:25:36 local/pri-4600 info bigd[3471]: reap_child: child process PID = 28079 exited with signal = 9
Oct 7 08:27:15 local/pri-4600 info bigd[3471]: reap_child: child process PID = 29286 exited with signal = 9
Oct 7 08:27:16 local/pri-4600 info bigd[3471]: reap_child: child process PID = 29307 exited with signal = 9
Oct 7 08:27:56 local/pri-4600 info bigd[3471]: reap_child: child process PID = 29793 exited with signal = 9
Oct 7 08:29:20 local/pri-4600 info bigd[3471]: reap_child: child process PID = 30851 exited with signal = 9
Oct 7 08:33:00 local/pri-4600 info bigd[3471]: reap_child: child process PID = 1122 exited with signal = 9
Oct 7 08:33:16 local/pri-4600 info bigd[3471]: reap_child: child process PID = 1299 exited with signal = 9
Oct 7 08:34:15 local/pri-4600 info bigd[3471]: reap_child: child process PID = 2054 exited with signal = 9
Oct 7 08:35:16 local/pri-4600 info bigd[3471]: reap_child: child process PID = 2784 exited with signal = 9
Oct 7 08:35:16 local/pri-4600 info bigd[3471]: reap_child: child process PID = 2807 exited with signal = 9
Oct 7 08:35:35 local/pri-4600 info bigd[3471]: reap_child: child process PID = 3015 exited with signal = 9
Oct 7 08:36:15 local/pri-4600 info bigd[3471]: reap_child: child process PID = 3601 exited with signal = 9
Is this normal? If not, what could be causing this to happen?
It is taking several attempts for users to SSH to several 64-bit Ubuntu machines which are using sssd
to authenticate.
The following will happen from 0-6 times in a row before the user is allowed to log in:
ssh [email protected]
[email protected]'s password:
Connection closed by 10.13.63.101
It takes several seconds for the Connection closed by 10.13.63.101
error to occur.
Here is a sample from /var/log/auth.log
which shows a failure followed by a success:
sshd[7098]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=x.x.x.x user=User
sshd[7098]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=x.x.x.x user=User
sshd[7098]: pam_sss(sshd:account): Access denied for user User: 4 (System error)
sshd[7098]: Failed password for bob from x.x.x.x port 54817 ssh2
sshd[7098]: fatal: Access denied for user bob by PAM account configuration [preauth]
sshd[7106]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=x.x.x.x user=bob
sshd[7106]: pam_sss(sshd:auth): authentication success; logname= uid=0 euid=0 tty=ssh ruser= rhost=x.x.x.x user=bob
sshd[7106]: Accepted password for bob from x.x.x.x port 54826 ssh2
sshd[7106]: pam_unix(sshd:session): session opened for user User by (uid=0)
systemd-logind[529]: Removed session 69.
systemd-logind[529]: New session 70 of user bob.
It seems the important line is the third one from the above output:
sshd[7098]: pam_sss(sshd:account): Access denied for user User: 4 (System error)
According to sssd
's official wiki (https://fedorahosted.org/sssd/wiki/Troubleshooting):
System Error is an "Unhandled Exception" during authentication. It can either be an SSSD bug or a fatal error during authentication.
The sssd
package is up to date:
$ sudo apt-get install sssd
sssd is already the newest version.
So far this is only affecting a very small sample of our systems (~4 of dozens).
Update:
It appears the sssd
logs are either empty or months old.
bob@system:/var/log/sssd$ ls -l
total 16
-rw------- 1 root root 0 Jan 26 2015 ldap_child.log
-rw------- 1 root root 0 Jan 26 2015 sssd_LDAP.log
-rw------- 1 root root 0 Aug 5 06:49 sssd.log
-rw------- 1 root root 268 Aug 4 16:24 sssd.log.1
-rw------- 1 root root 158 Jun 15 11:47 sssd.log.2.gz
-rw------- 1 root root 139 May 25 13:17 sssd.log.3.gz
-rw------- 1 root root 139 May 11 11:08 sssd.log.4.gz
-rw------- 1 root root 0 Jan 26 2015 sssd_nss.log
-rw------- 1 root root 0 Jan 26 2015 sssd_pam.log
We currently make use of several Nagios workers to distribute the workload using DNZ as described here: https://assets.nagios.com/downloads/general/docs/Distributed_Monitoring_Solutions.pdf. I have not been able to find any information on this in the official documentation, and most searches just link me back to their website. Ignoring the compute power required (CPU, RAM, etc.) is there any hard limit on how many hosts or services a single Nagios instance can monitor? What about on an individual worker?
I manage several F5 LTM devices which have a rapidly growing number of virtual servers configured (about 500 right now). I've noticed that when the device lists virtual servers it does so alphabetically, with no option to reorder them. This got me wondering, does the device check the entire list every time a connection is made? If so, how does the size of the list impact performance?
I am deploying Ubuntu 12.04 VM's from a template using vCenter 5.5. When I try to add nameservers using a customization specification they are added to the resolv.conf file. The VM will not use these nameservers, and will be unable to resolve names. I can, however, manually specify the same nameservers using the dig command and successfully resolve names. If I manually add nameservers to /etc/network/interfaces the VM will use them and becomes able to resolve names. What is the cause of this behavior?
We are running several RHEL 6 64-bit VM's in a vBlock infrastructure using vCenter 5.5. If I open a console session through the vSphere web client the ctrl+c keystroke will not work. This is very inconvenient as a simple ping is now unstoppable. If I access the same VM through the vSphere desktop client or SSH session, ctrl+c works as expected. The VM's have the VMware tools installed.
Right now I have two separate vCenter servers (one older, one brand new), each managing 3 clusters of hosts. I will eventually need to get 300-500 VM's from the old clusters to the new. If I put the two vCenter servers in linked mode can I migrate VM's across clusters on separate instances of vCenter?
Edit - Can't believe I didn't think to mention the versions. The old stuff is all 5.1 and the new systems are 5.5.