Jon Topper's questions -server

Jon Topper

Asked: 2011-08-05 04:31:09 +0800 CST

Using R for crunching system data?

2

I frequently hear good things about the R language for statistical analysis of data, but it looks as though the learning curve is steep. I'm interested to know if anyone's using R to crunch data about system performance and scalability to give greater insight into behaviour than a basic time series from a monitoring system gives. What value does R give you as a sysadmin?

Jon Topper

Asked: 2010-03-31 02:27:59 +0800 CST

SATA drive problems with two SIL RAID cards

4

I've just put a second SiI 3114 SATARaid card in my home server so that I could add another pair of SATA drives and increase my storage space. Annoyingly, it doesn't seem to work:

[   32.816030] ata5: lost interrupt (Status 0x0)
[   32.816072] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[   32.816091] ata5.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096
in
[   32.816094]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4
(timeout)
[   32.816101] ata5.00: status: { DRDY }
[   32.816117] ata5: hard resetting link
[   33.136082] ata5: SATA link down (SStatus 0 SControl 0)
[   36.060940] irq 18: nobody cared (try booting with the "irqpoll" option)
[   36.060949] Pid: 0, comm: swapper Not tainted 2.6.31-20-generic #58-Ubuntu
[   36.060954] Call Trace:
[   36.060977]  [] ? printk+0x18/0x1c
[   36.060997]  [] __report_bad_irq+0x27/0x90
[   36.061005]  [] note_interrupt+0x150/0x190
[   36.061011]  [] handle_fasteoi_irq+0xac/0xd0
[   36.061023]  [] handle_irq+0x18/0x30
[   36.061029]  [] do_IRQ+0x47/0xc0
[   36.061042]  [] ? irq_exit+0x50/0x70
[   36.061058]  [] ? smp_apic_timer_interrupt+0x57/0x90
[   36.061065]  [] common_interrupt+0x30/0x40
[   36.061075]  [] ? native_safe_halt+0x5/0x10
[   36.061082]  [] default_idle+0x46/0xd0
[   36.061088]  [] cpu_idle+0x8c/0xd0
[   36.061103]  [] rest_init+0x55/0x60
[   36.061111]  [] start_kernel+0x2e6/0x2ec
[   36.061117]  [] ? unknown_bootoption+0x0/0x19e
[   36.061133]  [] i386_start_kernel+0x7c/0x83
[   36.061137] handlers:
[   36.061139] [] (sil_interrupt+0x0/0xb0)
[   36.061151] Disabling IRQ #18
[   38.136014] ata5: hard resetting link
[   38.456022] ata5: SATA link down (SStatus 0 SControl 0)
[   43.456013] ata5: hard resetting link
[   43.776022] ata5: SATA link down (SStatus 0 SControl 0)
[   43.776035] ata5.00: disabled
[   43.776055] ata5.00: device reported invalid CHS sector 0
[   43.776074] sd 4:0:0:0: [sde] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[   43.776082] sd 4:0:0:0: [sde] Sense Key : Aborted Command [current]
[descriptor]
[   43.776092] Descriptor sense data with sense descriptors (in hex):
[   43.776097]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
[   43.776112]         00 00 00 00 
[   43.776118] sd 4:0:0:0: [sde] Add. Sense: No additional sense information
[   43.776127] end_request: I/O error, dev sde, sector 0
[   43.776136] Buffer I/O error on device sde, logical block 0
[   43.776170] ata5: EH complete
[   43.776187] ata5.00: detaching (SCSI 4:0:0:0)

root@core:~# cat /proc/interrupts 
           CPU0       
  0:         47   IO-APIC-edge      timer
  1:          8   IO-APIC-edge      i8042
  6:          3   IO-APIC-edge      floppy
  7:          0   IO-APIC-edge      parport0
  8:          0   IO-APIC-edge      rtc0
  9:          0   IO-APIC-fasteoi   acpi
 14:      53069   IO-APIC-edge      pata_sis
 15:      53004   IO-APIC-edge      pata_sis
 17:     112265   IO-APIC-fasteoi   sata_sil
 18:     200002   IO-APIC-fasteoi   sata_sil, SiS SI7012
 19:     111140   IO-APIC-fasteoi   eth0
 20:          0   IO-APIC-fasteoi   ohci_hcd:usb2
 21:          0   IO-APIC-fasteoi   ohci_hcd:usb3
 23:          0   IO-APIC-fasteoi   ehci_hcd:usb1
NMI:          0   Non-maskable interrupts
LOC:    6650492   Local timer interrupts
SPU:          0   Spurious interrupts
CNT:          0   Performance counter interrupts
PND:          0   Performance pending work
RES:          0   Rescheduling interrupts
CAL:          0   Function call interrupts
TLB:          0   TLB shootdowns
TRM:          0   Thermal event interrupts
THR:          0   Threshold APIC interrupts
MCE:          0   Machine check exceptions
MCP:        160   Machine check polls
ERR:          0
MIS:          0
root@core:~# lspci | grep Raid
00:09.0 RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
00:0a.0 RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
root@core:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 9.10
Release:        9.10
Codename:       karmic
root@core:~# uname -a
Linux core.topper.me.uk 2.6.31-20-generic #58-Ubuntu SMP Fri Mar 12 05:23:09 UTC 2010 i686 GNU/Linux

I've tried a combination of different kernel options (irqpoll, noapic, noacpi, pci=noapic) all to no avail. Does anyone have any bright ideas about how I can go about making this work?

Swapping PCI cards around isn't an option as there are only two slots in this motherboard (an ASRock K7S41GX). The BIOS doesn't look to have too much in the way of configuration options regarding IRQ usage.

Plan B is to ditch this server completely and buy a new QNAP for these drives to go in, but I was hoping to avoid doing this right now.

Jon Topper

Asked: 2010-02-04 03:59:20 +0800 CST

Disable fencing in RedHat Cluster?

3

Whilst I'm developing with RedHat Cluster, I'd like to be able to disable fencing completely. The documentation for this stuff is fairly lacking - can I do this at all, or should I be looking to fake up a null fencing method of some sort?

Jon Topper

Asked: 2010-01-15 07:33:12 +0800 CST

VMWare perl toolkit download

1

I seem to be finding it far harder than is reasonable to find a recent copy of the VIPerl Toolkit to download. There's a copy on sourceforge, but that's marked as beta and dated 2007. There are hints at a newer version on the VMWare website, but pretty much every path I take through that website results in exceptions or other errors. Help?

Jon Topper

Asked: 2009-11-28 06:03:23 +0800 CST

Configuration management: Cross-machine dependencies

8

I've used tools like puppet to manage individual systems, with generally a high level of success. Where puppet falls down is that it isn't good at managing dependencies outside of an individual server.

For example, on a MySQL server I configure puppet to do the following:

Configure authentication on the machine to hit my LDAP server
Configure apt to use my local repository mirror
Install MySQL packages
Write my.cnf
Start MySQL
Create users in the database

In this set of steps there are a number of dependencies resolved - for example, I can't start the database service unless I've installed the packages, which I can't do unless the apt repo is configured correctly.

This MySQL server is one box in a master->master replication setup. In an ideal world, puppet (or another similar tool) would let me represent the fact that server B needs to wait until server A is available and then attempt to establish a replication relationship with it.

A lot of text here - basically what I'm asking is: are there any tools like puppet which can manage inter-machine dependencies like this?

Jon Topper

Asked: 2009-11-13 04:18:34 +0800 CST

NFS export of other mounts in the tree

0

I have a filesystem under /var/hudson/jobs which is exported thus:

   /var/hudson/jobs *(ro,no_root_squash,nohide)

I regularly mount new LVM volumes under that directory structure (say, /var/hudson/jobs/A/2222) and want to be able to mount these from my client nodes.

In the configuration above, I get the following error if I try

request to export directory /var/hudson/jobs/A/2222 below nearest filesystem /var/hudson/jobs

I could live with mounting all of /var/hudson/jobs on the client but I can't see any content under /var/hudson/jobs/A/2222 when I do. This suggests to met that nohide isn't working as expected, though this isn't a surprise given the caveats in the man page.

How can I see this other filesystems from the NFS client without adding a line to /etc/exports for each one?

This is on RHEL5.

Jon Topper

Asked: 2009-08-11 06:51:50 +0800 CST

LVS TCP connection timeouts - lingering connections

2

I'm using keepalived to load-balance connections between a number of TCP servers. I don't expect it matters, but the service in this case is rabbitmq. I'm using NAT type balancing with weighted round-robin.

A client connects to the server thus:

[client]-----------[lvs]------------[real server]
            a                b

If a client connects to the LVS and remains idle, sending nothing on the socket, this eventually times out, according to timeouts set using ipvsadm --set. At this point, the connection marked 'a' above correctly disappears from the output of netstat -anp on the client, and from the output of ipvsadm -L -n -c on the lvs box. Connection 'b', however, remains ESTABLISHED according to netstat -anp on the real server box.

Why is this? Can I force lvs to properly reset the connection to the real server?

Jon Topper

Asked: 2009-07-22 05:02:47 +0800 CST

ESXi networking occasionally fails on VM start

1

On my ESXi 3.5 servers, I occasionally see a problem where - on newly started VMs - I see no traffic at all on their virtual network interfaces. I have to go into the VM settings and remove, then re-add the network interfaces before I see any traffic.

Has anyone seen this before? Is it possible I need to configure my physical switches (Cisco) differently to support this?

I'm using portgroups to present VLAN traffic to VMs - I've seen this happen on both trunked and untrunked ports, if that matters.

Jon Topper

Asked: 2009-07-22 01:38:26 +0800 CST

rsync --files-from equivalent

2

I'm attempting to back some stuff up from a server that runs an old version of rsync - for various reasons I can't just upgrade the software.

Usually, I'd use --files-from and give a list of files and directories to back up, but this version of rsync doesn't have that switch. Is there a way, with other switches, to make an older rsync behave the same way?

I've tried a combination of --include-from= --exclude=* but that looks to be insufficiently recursive (eg. /etc/* only backs up things directly below /etc).

Jon Topper

Asked: 2009-07-09 05:00:03 +0800 CST

Web front-end to administering users/groups with smbldap

0

I'm using smbldap tools on an Ubuntu Hardy infrastructure, and I'd like to be able to provide a web front-end for non-technical users to manipulate users and group membership. Does such a beast exist?

Jon Topper

Asked: 2009-07-07 07:06:30 +0800 CST

Test-driven development for infrastructure deployments?

11

I've been using puppet for deployment of infrastructure, and most of the work I do is with Web 2.0 companies who are heavily into test-driven development for their web application. Does anyone here use a test-driven approach to developing their server configurations? What tools do you use to do this? How deep does your testing go?

Jon Topper

Asked: 2009-07-07 07:00:15 +0800 CST

Sun LOM alerts

1

I'm monitoring Sun hardware using SNMP to gather information from the LOM card. One of the datapoints I'm monitoring is a voltage status MIB which alerts if any of the internal voltages becomes too high or low - with the thresholds being set, presumably, by Sun at the time of manufacture. These trigger with surprising frequency - is this anything to worry about?

Jon Topper

Asked: 2009-06-16 14:59:42 +0800 CST

Sun LOM KVM not working on Linux

0

I have a set of SunFire X2100 and X2200 servers. If I attempt to use the remote KVM feature from my Ubuntu Linux box, the Java app loads and attempts to connect but receives an "Authentication Error" message. The same thing works fine from a Windows desktop using IE. Is there something I can do to fix this?

Jon Topper

Asked: 2009-05-29 06:29:33 +0800 CST

SunFire LOM - MIBS?

2

I've got a collection of SunFire hardware here, X2100, X2200 and X4250. Sun seem to have made it spectacularly difficult to find MIBs that I can use with net-snmp to monitor the state of the hardware using the LOM hardware. Can anyone point me in the right direction? I need MIBs for

  SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.42.2.208.3

Using R for crunching system data?

SATA drive problems with two SIL RAID cards

Disable fencing in RedHat Cluster?

VMWare perl toolkit download

Configuration management: Cross-machine dependencies

NFS export of other mounts in the tree

LVS TCP connection timeouts - lingering connections

ESXi networking occasionally fails on VM start

rsync --files-from equivalent

Web front-end to administering users/groups with smbldap

Test-driven development for infrastructure deployments?

Sun LOM alerts

Sun LOM KVM not working on Linux

SunFire LOM - MIBS?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?