I am trying to PXE boot a machine. In the system log, I can see:
dhcpd[28030]: DHCPDISCOVER from 98:90:96:bc:fc:e3 via 10.65.240.2
dhcpd[28030]: none: host unknown.
dhcpd[28030]: DHCPOFFER on 10.65.240.111 to 98:90:96:bc:fc:e3 via 10.65.240.2
I can't for the life of me work out the problem. The subnet is valid and present, the machine has a lease, and this machine has previously built just fine!
Some googling indicates that 'host unknown' implies something to do with DNS: the server has zone files for the forward and reverse zone this box sits in.
So I found this page (http://www.tldp.org/HOWTO/DHCP/x369.html) which mentions adding an entry to
/etc/hosts
.I added:
to the end, and that appears to have fixed the problem. Not sure why I've not needed this before, as it seems pretty fundamental. Anyone know why this is happening?
DHCP servers must be able to send DHCPOFFER packets to clients that do not have an IP then they broadcast their DHCPOFFERs with a broadcast destination MAC address (FF:FF:FF:FF:FF:FF) and also with a broadcast destination IP address (255.255.255.255). Unfortunately, Linux insists on changing the 255.255.255.255 destination IP into the local subnet broadcast address; this leads to a DHCP protocol violation.
While many DHCP clients won't notice the problem, some (e.g., all Microsoft DHCP clients) will. Clients that have this problem will appear not to see the DHCPOFFER messages from the server.
What the previously quoted page does is "tricking" the Linux net engine to be able to use the 255.255.255.255 IP as the destination IP of the DHCPOFFER by several methods involving the creation of routes, or adding hostnames with he 255.255.255.255 IP, etc.
EDIT: It does not matter where the DHCP client is network located; the problem is that some picky DHCP clients will ignore a DHCP offer if its destination IP is other than 255.255.255.255. Please consider if you i.e. PXE boot a PC the PXE firmware is the "first" DHCP client, next if you i.e. load a Linux kernel/initrd you will have a "second" DHCP request made now by the booting Linux kernel. In this case on a single PXE boot session you got 2 consecutive completely different DHCP clients and it can happens that while the first one could tolerate the described DHCP protocol violation the second one probably cannot. The same PXE booting PC could've very well worked before if you booted only "forgiving" DHCP clients.
In order to "see" what's going on I would recommend getting some Wireshark traffic captures and see the problem at packet level.