I'm at a loss as to what's going wrong for me here. I have a few dozen units that work, and a few dozen units that don't, and they all vary by hardware and platform.
I have a CentOS 7.3 PXE server running cobbler with a number of CentOS-based LiveCD options on it. They worked fine up until this morning, and suddenly now we are seeing the following behavior when trying to load the vanilla CentOS LiveCD from PXE:
- Hit enter
- Kernel downloads
- Initrd downloads, but silently (only 3 "."s show up, but I can tell its downloading watching tcpdump on the server)
- The download finishes, the screen flashes, and the PXE menu comes back up
- Subsequent retries result in the menu flashing and coming back up with an "invalid kernel parameter" error so briefly that I had to record it with screencap software to even see it. Additionally, only 1 packet is actually sent to the client; it's like it doesn't even attempt to download it on the second try.
The pxe menu entry for the vanilla CentOS LiveCD that looks like this:
/images/centos_livecd/centos_vmlinuz initrd=/images/centos_livecd/centos_livecd_initrd.img ksdevice=bootif lang= root=live:/centos_livecd.iso kssendmac text ks=http://10.101.24.21/cblr/svc/op/ks/profile/centos_livecd BOOTIF=<MAC>
Again - I have about 20 units of varying motherboard and platform NOT working, and about 40 or so units of varying motherboard and platform that ARE working with the exact same menu entry.
Regular installer menu entries work great - CentOS, Ubuntu, etc.
So far I've tried:
- Using a vmlinuz from a CentOS install ISO
- Monitoring xinetd with "watch -n 1 systemctl status xinetd" and seeing the requests come in
- Monitoring tcpdump with "tcpdump -vvi |grep "
I'm at a loss, and I'm desperate. Does anyone have any ideas?
If I can gather more information using a different utility somehow on a system that is loading from PXE I would love to know how.
More information:
While tailing /var/log/messages, I noticed that the first try of loading the LiveCD appears to go swimmingly according to the network, but nothing happens on the client once the initrd.img is downloaded:
Jul 28 15:10:30 jarvis in.tftpd[12496]: RRQ from 10.101.26.176 filename /images/centos_livecd/centos_vmlinuz
Jul 28 15:10:30 jarvis in.tftpd[12496]: Client 10.101.26.176 finished /images/centos_livecd/centos_vmlinuz
Jul 28 15:10:30 jarvis in.tftpd[12501]: RRQ from 10.101.26.176 filename /images/centos_livecd/centos_livecd_initrd.img
Jul 28 15:11:39 jarvis in.tftpd[12501]: Client 10.101.26.176 finished /images/centos_livecd/centos_livecd_initrd.img
We were using files in /var/lib/tftpboot from Syslinux version 4.07, which is .02 past what CentOS 7.3 ships with. We were using these files because 4.05 doesn't support PXE menu chaining, but 4.07 does.
Overwriting the files in /var/lib/tftpboot with files from Syslinux version 4.05 found in /usr/share/syslinux resolved the issue, and removed PXE chaining.
Version 4.07 files worked fine for 2 weeks without issue; I'm still not sure why they suddenly stopped working for some units and not others.