I frequently hear good things about the R language for statistical analysis of data, but it looks as though the learning curve is steep. I'm interested to know if anyone's using R to crunch data about system performance and scalability to give greater insight into behaviour than a basic time series from a monitoring system gives. What value does R give you as a sysadmin?
Jon Topper's questions
I've just put a second SiI 3114 SATARaid card in my home server so that I could add another pair of SATA drives and increase my storage space. Annoyingly, it doesn't seem to work:
[ 32.816030] ata5: lost interrupt (Status 0x0) [ 32.816072] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 32.816091] ata5.00: cmd c8/00:08:00:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in [ 32.816094] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 32.816101] ata5.00: status: { DRDY } [ 32.816117] ata5: hard resetting link [ 33.136082] ata5: SATA link down (SStatus 0 SControl 0) [ 36.060940] irq 18: nobody cared (try booting with the "irqpoll" option) [ 36.060949] Pid: 0, comm: swapper Not tainted 2.6.31-20-generic #58-Ubuntu [ 36.060954] Call Trace: [ 36.060977] [] ? printk+0x18/0x1c [ 36.060997] [] __report_bad_irq+0x27/0x90 [ 36.061005] [] note_interrupt+0x150/0x190 [ 36.061011] [] handle_fasteoi_irq+0xac/0xd0 [ 36.061023] [] handle_irq+0x18/0x30 [ 36.061029] [] do_IRQ+0x47/0xc0 [ 36.061042] [] ? irq_exit+0x50/0x70 [ 36.061058] [] ? smp_apic_timer_interrupt+0x57/0x90 [ 36.061065] [] common_interrupt+0x30/0x40 [ 36.061075] [] ? native_safe_halt+0x5/0x10 [ 36.061082] [] default_idle+0x46/0xd0 [ 36.061088] [] cpu_idle+0x8c/0xd0 [ 36.061103] [] rest_init+0x55/0x60 [ 36.061111] [] start_kernel+0x2e6/0x2ec [ 36.061117] [] ? unknown_bootoption+0x0/0x19e [ 36.061133] [] i386_start_kernel+0x7c/0x83 [ 36.061137] handlers: [ 36.061139] [] (sil_interrupt+0x0/0xb0) [ 36.061151] Disabling IRQ #18 [ 38.136014] ata5: hard resetting link [ 38.456022] ata5: SATA link down (SStatus 0 SControl 0) [ 43.456013] ata5: hard resetting link [ 43.776022] ata5: SATA link down (SStatus 0 SControl 0) [ 43.776035] ata5.00: disabled [ 43.776055] ata5.00: device reported invalid CHS sector 0 [ 43.776074] sd 4:0:0:0: [sde] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 43.776082] sd 4:0:0:0: [sde] Sense Key : Aborted Command [current] [descriptor] [ 43.776092] Descriptor sense data with sense descriptors (in hex): [ 43.776097] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 [ 43.776112] 00 00 00 00 [ 43.776118] sd 4:0:0:0: [sde] Add. Sense: No additional sense information [ 43.776127] end_request: I/O error, dev sde, sector 0 [ 43.776136] Buffer I/O error on device sde, logical block 0 [ 43.776170] ata5: EH complete [ 43.776187] ata5.00: detaching (SCSI 4:0:0:0)
root@core:~# cat /proc/interrupts CPU0 0: 47 IO-APIC-edge timer 1: 8 IO-APIC-edge i8042 6: 3 IO-APIC-edge floppy 7: 0 IO-APIC-edge parport0 8: 0 IO-APIC-edge rtc0 9: 0 IO-APIC-fasteoi acpi 14: 53069 IO-APIC-edge pata_sis 15: 53004 IO-APIC-edge pata_sis 17: 112265 IO-APIC-fasteoi sata_sil 18: 200002 IO-APIC-fasteoi sata_sil, SiS SI7012 19: 111140 IO-APIC-fasteoi eth0 20: 0 IO-APIC-fasteoi ohci_hcd:usb2 21: 0 IO-APIC-fasteoi ohci_hcd:usb3 23: 0 IO-APIC-fasteoi ehci_hcd:usb1 NMI: 0 Non-maskable interrupts LOC: 6650492 Local timer interrupts SPU: 0 Spurious interrupts CNT: 0 Performance counter interrupts PND: 0 Performance pending work RES: 0 Rescheduling interrupts CAL: 0 Function call interrupts TLB: 0 TLB shootdowns TRM: 0 Thermal event interrupts THR: 0 Threshold APIC interrupts MCE: 0 Machine check exceptions MCP: 160 Machine check polls ERR: 0 MIS: 0 root@core:~# lspci | grep Raid 00:09.0 RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02) 00:0a.0 RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02) root@core:~# lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 9.10 Release: 9.10 Codename: karmic root@core:~# uname -a Linux core.topper.me.uk 2.6.31-20-generic #58-Ubuntu SMP Fri Mar 12 05:23:09 UTC 2010 i686 GNU/Linux
I've tried a combination of different kernel options (irqpoll, noapic, noacpi, pci=noapic) all to no avail. Does anyone have any bright ideas about how I can go about making this work?
Swapping PCI cards around isn't an option as there are only two slots in this motherboard (an ASRock K7S41GX). The BIOS doesn't look to have too much in the way of configuration options regarding IRQ usage.
Plan B is to ditch this server completely and buy a new QNAP for these drives to go in, but I was hoping to avoid doing this right now.
I seem to be finding it far harder than is reasonable to find a recent copy of the VIPerl Toolkit to download. There's a copy on sourceforge, but that's marked as beta and dated 2007. There are hints at a newer version on the VMWare website, but pretty much every path I take through that website results in exceptions or other errors. Help?
I've used tools like puppet to manage individual systems, with generally a high level of success. Where puppet falls down is that it isn't good at managing dependencies outside of an individual server.
For example, on a MySQL server I configure puppet to do the following:
- Configure authentication on the machine to hit my LDAP server
- Configure apt to use my local repository mirror
- Install MySQL packages
- Write my.cnf
- Start MySQL
- Create users in the database
In this set of steps there are a number of dependencies resolved - for example, I can't start the database service unless I've installed the packages, which I can't do unless the apt repo is configured correctly.
This MySQL server is one box in a master->master replication setup. In an ideal world, puppet (or another similar tool) would let me represent the fact that server B needs to wait until server A is available and then attempt to establish a replication relationship with it.
A lot of text here - basically what I'm asking is: are there any tools like puppet which can manage inter-machine dependencies like this?
I have a filesystem under /var/hudson/jobs which is exported thus:
/var/hudson/jobs *(ro,no_root_squash,nohide)
I regularly mount new LVM volumes under that directory structure (say, /var/hudson/jobs/A/2222) and want to be able to mount these from my client nodes.
In the configuration above, I get the following error if I try
request to export directory /var/hudson/jobs/A/2222 below nearest filesystem /var/hudson/jobs
I could live with mounting all of /var/hudson/jobs on the client but I can't see any content under /var/hudson/jobs/A/2222 when I do. This suggests to met that nohide isn't working as expected, though this isn't a surprise given the caveats in the man page.
How can I see this other filesystems from the NFS client without adding a line to /etc/exports for each one?
This is on RHEL5.
I'm using keepalived to load-balance connections between a number of TCP servers. I don't expect it matters, but the service in this case is rabbitmq. I'm using NAT type balancing with weighted round-robin.
A client connects to the server thus:
[client]-----------[lvs]------------[real server]
a b
If a client connects to the LVS and remains idle, sending nothing on the socket, this eventually times out, according to timeouts set using ipvsadm --set
. At this point, the connection marked 'a' above correctly disappears from the output of netstat -anp
on the client, and from the output of ipvsadm -L -n -c
on the lvs box. Connection 'b', however, remains ESTABLISHED according to netstat -anp
on the real server box.
Why is this? Can I force lvs to properly reset the connection to the real server?
On my ESXi 3.5 servers, I occasionally see a problem where - on newly started VMs - I see no traffic at all on their virtual network interfaces. I have to go into the VM settings and remove, then re-add the network interfaces before I see any traffic.
Has anyone seen this before? Is it possible I need to configure my physical switches (Cisco) differently to support this?
I'm using portgroups to present VLAN traffic to VMs - I've seen this happen on both trunked and untrunked ports, if that matters.
I'm attempting to back some stuff up from a server that runs an old version of rsync - for various reasons I can't just upgrade the software.
Usually, I'd use --files-from and give a list of files and directories to back up, but this version of rsync doesn't have that switch. Is there a way, with other switches, to make an older rsync behave the same way?
I've tried a combination of --include-from= --exclude=* but that looks to be insufficiently recursive (eg. /etc/* only backs up things directly below /etc).
I've been using puppet for deployment of infrastructure, and most of the work I do is with Web 2.0 companies who are heavily into test-driven development for their web application. Does anyone here use a test-driven approach to developing their server configurations? What tools do you use to do this? How deep does your testing go?
I'm monitoring Sun hardware using SNMP to gather information from the LOM card. One of the datapoints I'm monitoring is a voltage status MIB which alerts if any of the internal voltages becomes too high or low - with the thresholds being set, presumably, by Sun at the time of manufacture. These trigger with surprising frequency - is this anything to worry about?
I have a set of SunFire X2100 and X2200 servers. If I attempt to use the remote KVM feature from my Ubuntu Linux box, the Java app loads and attempts to connect but receives an "Authentication Error" message. The same thing works fine from a Windows desktop using IE. Is there something I can do to fix this?
I've got a collection of SunFire hardware here, X2100, X2200 and X4250. Sun seem to have made it spectacularly difficult to find MIBs that I can use with net-snmp to monitor the state of the hardware using the LOM hardware. Can anyone point me in the right direction? I need MIBs for
SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.42.2.208.3