Hongli Lai's questions -server

Hongli Lai

Asked: 2013-07-18 10:55:37 +0800 CST

Anybody experiencing full system lockups with LUKS?

4

I've recently setup a couple of new servers. This time I'm encrypting most of my partitions using dmcrypt+LUKS. However these new servers crash very often, every few days. Full lockups, kernel does not respond to keyboard, system does not ping. According to Munin graphs and atop records, there has been no increase in resource usage. There are no relevant log records in the local syslog logs. There are no relevant records on our remote log host (which the new servers forward syslog to). There are no relevant netconsole messages (the new servers forward all kernel messages using netconsole to a log host). The kernel didn't even print anything to the TTY. I asked the hosting company to perform a full hardware test, and they found nothing. I'm suspecting LUKS. Does anybody else also experience full lock ups with LUKS? The only reference I could find is http://ubuntuforums.org/showthread.php?t=2125287.

Hongli Lai

Asked: 2013-06-06 15:35:51 +0800 CST

How to non-interactively supply a passphrase to 'dmcrypt luksFormat'?

10

I'm writing a script which automatically sets up testing environment virtual machines. This script should automatically format a dmcrypt+LUKS partition for me, with a certain passphrase. Because this is a local testing environment I don't care about the security of the passphrase, I just want the entire VM setup process to be automated and non-interactive.

How can I non-interactively supply a passphrase to 'dmcrypt luksFormat'? I want to use passphrases, not keys, because in production we use passphrases for LUKS as well.

Hongli Lai

Asked: 2013-05-22 01:39:18 +0800 CST

How to get notified of mdadm RAID problems?

5

I am running Ubuntu 12.04 LTS. Yesterday I found a message in my mailbox saying that my server was shut down. I proceeded to reboot the system, but it didn't come up after many minutes, and I didn't have a hardware KVM system to see what the kernel was printing to the terminal. So I rebooted the system to a Linux rescue image and I saw that the software RAID 1 array was out of sync. The rescue system also began to reconstruct the RAID array.

So far there is no evidence that any of the disks have hardware errors. SMART statuses look good so far.

I never received an email notification by mdadm, even though email notification was turned on in /etc/mdadm/mdadm.conf.

This server was also configured to forward all syslog messages to a log host, so I checked my log host. The relevant parts are:

May 20 15:38:40 kernel: [    1.869825] md0: detected capacity change from 0 to 536858624
May 20 15:38:40 kernel: [    1.870687]  md0: unknown partition table
May 20 15:38:40 kernel: [    1.877412] md: bind
May 20 15:38:40 kernel: [    1.878337] md/raid1:md1: not clean -- starting background reconstruction
May 20 15:38:40 kernel: [    1.878376] md/raid1:md1: active with 2 out of 2 mirrors
May 20 15:38:40 kernel: [    1.878418] md1: detected capacity change from 0 to 3000052808704
May 20 15:38:40 kernel: [    1.878575] md: resync of RAID array md1
[snip]
May 20 15:52:33 kernel: Kernel logging (proc) stopped.
May 20 15:52:33 rsyslogd: [origin software="rsyslogd" swVersion="5.8.6" x-pid="845" x-info="http://www.rsyslog.com"] exiting on signal 15.

As you can see, the system (the normal one, not the rescue system) already detected that something was wrong with the RAID array during a system boot. Then, shortly after, something (not me) halted the system.

So my questions are:

What could cause the disks to suddenly become out of sync?
Why was I not notified by email?
Why was the error not properly logged to syslog before halting the system? Could it be that the system tried to log to syslog, but did so after stopping the syslog daemon? If so what can I do to prevent that?
What can I do to find out what happened? Or, if there's no way for me now to find out what happened, how can I improve logging and notifications so that next time I can do a better post-mortem?

My question is not about proper backup practice. I already know that RAID is not a backup etc. My question is solely about notifications and diagnosis.

Hongli Lai

Asked: 2013-05-10 15:44:40 +0800 CST

How do I install Grub2 on the first hard drive but Ubuntu on the second?

0

I want to install the OS (Ubuntu 12.04) on the second hard drive, but I'm unable to access the BIOS so I must install Grub on the first hard drive. How do I install Grub on the first hard drive, and have Grub boot the OS from the second hard drive?

Here's what I've tried so far. I installed Ubuntu on the second hdd, which had a /boot partition and a root partition. It didn't boot because the boot loader was on the second hdd but the system insists on booting from the first.
Then I booted from a rescue system, chrooted to my second hdd's root parition, mounted the second hdd's boot partition to /boot, and ran grub-install /dev/sda. grub-install refused to run because it couldn't find any partitions on the first hdd (which it didn't have). So I made a boot partition on the first hdd and copied the second hdd's boot parition's contents to there. This time grub-install succeeded. The system booted. But even though the system booted from the first hdd's boot partition, once booted it mounts the second hdd's boot partition. That can't be good for kernel upgrades, so I edited /etc/fstab, changed /boot's device to /dev/sda, ran update-grab && grub-install /dev/sda, and rebooted. This time it seems to work too, except that grub's timeout is gone. Since this is a server that I tend to access remotely, grub not automatically booting is problematic. grub.cfg contains the timeout option but the timeout doesn't actually work, leading me to think that I may be installing grub incorrectly.

Hongli Lai

Asked: 2013-05-10 00:17:44 +0800 CST

How to deal with bad sectors?

4

Bad sectors will eventually occur, but how should I deal with them? If a bad sector occurs, does that mean that the data in that sector is irrecoverably lost, and I should restore it from backup? Is there any way to automate finding out which file belonged to that sector and at which offset, and to automate that recovery? Is there anything I can do on the filesystem level to make my life easier? (ECC?)

Hongli Lai

Asked: 2013-05-09 23:45:52 +0800 CST

How are SMART selftests related to badblocks?

21

The smartctl tool allows initiating a long self-test (smartctl -t long /dev/sda). However there's also badblocks that I can run on a drive. How are the two related? If badblocks detects bad blocks, does the drive automatically update its SMART values (e.g. by updating its relocated sectors count)? Can badblocks replace smartctl -t long, or vice versa?

Hongli Lai

Asked: 2012-07-15 01:27:54 +0800 CST

Why is SPF being validated against my mail server's IP instead of sender's IP?

6

I have a mail server "example.com" which forwards all emails with recipient "[email protected]" to "[email protected]". My mail server runs Postfix and it uses the virtual_alias_maps mechanism to perform the forwarding. I also have SPF records installed for "example.com":

v=spf1 a include:aspmx.googlemail.com ~all

The problem is, whenever someone delivers mail to "[email protected]", Gmail validates the example.com SPF records against example.com's IP address! I thought it's supposed to validate against the original sender's IP address.

For example, I'm on my laptop on my home Internet connection. I connect to example.com's mail server as follows:

$ telnet example.com 25
20 example.com ESMTP Postfix (Debian/GNU)
HELO my-laptop.local
250 example.com
MAIL FROM:<[email protected]>
250 2.1.0 Ok
RCPT TO:<[email protected]>
250 2.1.5 Ok
DATA
354 End data with <CR><LF>.<CR><LF>
From: [email protected]
To: [email protected]
Subject: test

test
.
250 2.0.0 Ok: queued as CE5F42200F9

Now when I open that mail in Gmail and view its source, I see the following headers:

Delivered-To: [email protected]
Received: by 10.231.219.195 with SMTP id hv3csp61494ibb;
        Sat, 14 Jul 2012 02:15:58 -0700 (PDT)
Received: by 10.229.135.5 with SMTP id l5mr2360326qct.5.1342257358291;
        Sat, 14 Jul 2012 02:15:58 -0700 (PDT)
Return-Path: <[email protected]>
Received: from example.com [EXAMPLE.COM's IP ADDRESS HERE]
        by mx.google.com with ESMTP id u9si4262071qcv.89.2012.07.14.02.15.58;
        Sat, 14 Jul 2012 02:15:58 -0700 (PDT)
Received-SPF: neutral (google.com: [EXAMPLE.COM's IP ADDRESS HERE] is neither permitted nor denied by domain of [email protected]) client-ip=[EXAMPLE.COM's IP ADDRESS HERE];
Authentication-Results: mx.google.com; spf=neutral (google.com: [EXAMPLE.COM's IP ADDRESS HERE] is neither permitted nor denied by domain of [email protected]) [email protected]
Date: Sat, 14 Jul 2012 02:15:58 -0700 (PDT)
Message-Id: <[email protected]>
Received: from my-laptop.local ([LAPTOP's IP ADDRESS HERE])
    by example.com (Postfix) with SMTP id CE5F42200F9
    for <[email protected]>; Sat, 14 Jul 2012 09:15:44 +0000 (UTC)
From: [email protected]
To: [email protected]
Subject: test

As you can see in Received-SPF and Authentication-Results, the SPF records are being validated against [EXAMPLE.COM's IP ADDRESS] instead of [LAPTOP's IP ADDRESS].

Why does this happen, and how do I fix this problem?

Hongli Lai

Asked: 2011-10-14 03:02:52 +0800 CST

How to prevent remote hosts from delivering mail to Postfix with spoofed From header?

3

I have a host, let's call it foo.com, on which I'm running Postfix on Debian. Postfix is currently configured to do these things:

All mail with @foo.com as recipient is handled by this Postfix server. It forwards all such mail to my Gmail account. The firewall thus allows port 25.
All mail with another domain as recipient is rejected.
SPF records have been set up for the foo.com domain, saying that foo.com is the sole origin of all mail from @foo.com.
Applications running on foo.com can connect to localhost:25 to deliver mail, with [email protected] as sender.

However I recently noticed that some spammers are able to send spam to me while passing the SPF checks. Upon further inspection, it looks like they connect to my Postfix server and then say

HELO bar.com
MAIL FROM:<[email protected]>     <---- this!
RCPT TO:<[email protected]>
DATA
From: "Buy Viagra" <[email protected]>   <--- and this!
...

How do I prevent this? I only want applications running on localhost to be able to say MAIL FROM:<[email protected]>. Here's my current config (main.cf): https://gist.github.com/1283647

Hongli Lai

Asked: 2011-04-24 09:36:00 +0800 CST

Is the smartmontools daemon running regular health checks?

0

My server has 2 harddisks. I've installed smartmontools on Debian with apt-get install smartmontools, enabled it in /etc/default/smartmontools (start_smartd=yes) and started the daemon (/etc/init.d/smartmontools start).

My /etc/smartd.conf contains this:

DEVICESCAN -d removable -n standby -m [email protected] -M exec /usr/share/smartmontools/smartd-runner

Is the smartmontools now configured to run regular health checks? If so, how do I see when it does that? I don't see any indication in smartctl -l selftest of health checks being run regularly; the command only showed the two tests I recently ran manually. I also don't see anything in /var/log/messages that indicates regular health checks are being run.

Hongli Lai

Asked: 2011-02-23 05:26:12 +0800 CST

Where do these mysterious DNS lookups come from and why are they slow?

1

I have recently obtained a new dedicated server which I'm now setting up. It's running on 64-bit Debian 6.0. I have cloned a fairly large git repository (177 MB including working files) onto this server. Switching to a different branch is very very slow. On my laptop it takes 1-2 seconds, on this server it can take half a minute. After some investigation it turns out to be some kind of DNS timeout. Here's an exhibit from strace -s 128 git checkout release:

stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=132, ...}) = 0
socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 5
connect(5, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("213.133.99.99")}, 16) = 0
poll([{fd=5, events=POLLOUT}], 1, 0)    = 1 ([{fd=5, revents=POLLOUT}])
sendto(5, "\235\333\1\0\0\1\0\0\0\0\0\0\35Debian-60-squeeze-64-minimal\n\17happyponies\3com\0\0\1\0\1", 67, MSG_NOSIGNAL, NULL, 0) = 67
poll([{fd=5, events=POLLIN}], 1, 5000)  = 0 (Timeout)

This snippet repeats several times per 'git checkout' call.

My server's hostname was originally Debian-60-squeeze-64-minimal. I had changed it to shell.happyponies.com by running hostname shell.happyponies.com, editing /etc/hostname and rebooting the server.

I don't understand the DNS protocol, but it looks like Git is trying to lookup the IP for Debian-60-squeeze-64-minimal as well as for happyponies.com. Why does Debian-60-squeeze-64-minimal come back even though I've already changed the host name? Why does Git perform DNS lookups at all? Why are these lookups so slow? I've already verified that all DNS servers in /etc/resolv.conf are up and responding slowly, yet Git's own lookups time out.

Changing the host name back to Debian-60-squeeze-64-minimal seems to fix the slowness.

Basically I just want to fix whatever DNS issues my server has because I'm sure they will cause more problems that just slowing down git checkout. But I'm not sure sure what the problem exactly is and what these symptoms mean.

Hongli Lai

Asked: 2010-06-18 09:11:24 +0800 CST

How to tune TCP TIME_WAIT timeout on Solaris?

1

I'm trying to change the TCP TIME_WAIT timeout on Solaris. According to some Google results I need to run this command:

ndd -set /dev/tcp tcp_time_wait_interval 60000

However I get:

operation failed: Not owner

What am I doing wrong? I'm already running ndd as root. Is there another way to tune TIME_WAIT?

Hongli Lai

Asked: 2009-10-28 01:34:20 +0800 CST

Compiling 32-bit binaries on 64-bit RHEL

2

I have a 64-bit RHEL 5.3 server. There's a piece of server software that's more memory efficient if I compile it as 32-bit. Is there a way to tell GCC to target 32-bit?

I just want a specific piece of software to be 32-bit, everything else should stay 64-bit.

This software is not packaged in the yum repositories so I cannot just do 'yum install 32-bit-version'.

Hongli Lai

Asked: 2009-08-16 06:42:32 +0800 CST

How to install/change locale on Debian?

103

I've written a web application for which the user interface is in Dutch. I use the system's date and time routines to format date strings in the application. However, the date strings that the system formats are in English but I want them in Dutch, so I need to set the system's locale. How do I do that on Debian? I tried setting LC_ALL=nl_NL but it doesn't seem to have any effect:

$ date
Sat Aug 15 14:31:31 UTC 2009
$ LC_ALL=nl_NL date
Sat Aug 15 14:31:36 UTC 2009

I remember that setting LC_ALL on my Ubuntu desktop system works fine. Do I need to install extra packages to make this work, or am I doing it entirely wrong?

Anybody experiencing full system lockups with LUKS?

How to non-interactively supply a passphrase to 'dmcrypt luksFormat'?

How to get notified of mdadm RAID problems?

How do I install Grub2 on the first hard drive but Ubuntu on the second?

How to deal with bad sectors?

How are SMART selftests related to badblocks?

Why is SPF being validated against my mail server's IP instead of sender's IP?

How to prevent remote hosts from delivering mail to Postfix with spoofed From header?

Is the smartmontools daemon running regular health checks?

Where do these mysterious DNS lookups come from and why are they slow?

How to tune TCP TIME_WAIT timeout on Solaris?

Compiling 32-bit binaries on 64-bit RHEL

How to install/change locale on Debian?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?